Punycode Converter: Convert International Domain Names

Learn what Punycode is, how international domain names work, and how to convert between Unicode and Punycode. Free online Punycode converter tool included.

Web ToolsApril 13, 202610 min read

What Is Punycode?

Punycode is an encoding syntax designed to represent Unicode characters using only the ASCII character set. It was created to solve a fundamental problem with the Domain Name System (DNS): DNS only supports ASCII characters (letters A-Z, digits 0-9, and hyphens), but people around the world write in hundreds of scripts including Chinese, Arabic, Cyrillic, Devanagari, and many others. Punycode bridges this gap by converting internationalized domain names (IDNs) into an ASCII-compatible encoding (ACE) that DNS can handle.

The encoding is defined in RFC 3492 and is a component of the broader Internationalized Domain Names in Applications (IDNA) standard. When you register a domain with international characters, the domain name is internally converted to Punycode before it can be used with DNS. For example, "muller.de" becomes "xn--mller-kva.de" and Chinese characters for "hello" become "xn--6qq986b3xl.com".

The "xn--" prefix is a standard signal that indicates Punycode-encoded text. This prefix was chosen to be highly unlikely to appear in normal domain names, making it easy for systems to distinguish between regular ASCII domains and Punycode-encoded international domains.

Why Punycode Is Necessary

The Domain Name System was designed in the early 1980s, long before the internet became a global phenomenon. At the time, ASCII was considered sufficient for domain names because the internet was primarily used by English-speaking researchers and institutions. As the internet expanded globally, the ASCII-only limitation became a significant barrier to adoption.

The IDN Problem

Consider a business in Japan that wants its website to reflect its Japanese name. Without internationalization, the business would be forced to use a romanized version of its name, which may be unfamiliar to its Japanese customers. Similarly, a Russian company, an Arabic-speaking organization, or a Chinese brand all face the same issue. Punycode solves this by allowing any Unicode character to be encoded in an ASCII-compatible format.

The Homograph Attack

One significant security concern related to Punycode is the homograph attack. Because many Unicode characters look identical or nearly identical to ASCII characters, an attacker could register a domain that visually appears to be a legitimate domain but is actually different. For example, the Cyrillic letter "a" looks identical to the Latin letter "a," so "p aypal.com" (using Cyrillic a) could be registered to impersonate "paypal.com."

Modern browsers mitigate this risk by displaying Punycode-encoded domains in their raw form (showing "xn--" prefixes) when the domain contains characters from multiple scripts, or by restricting which scripts can be mixed in a single domain. This makes homograph attacks much more difficult to execute successfully.

How Punycode Encoding Works

Punycode uses a clever bootstring algorithm that separates the ASCII characters from the non-ASCII characters and encodes the non-ASCII characters as a sequence of variable-length integers appended after a hyphen.

Step 1: Separate ASCII Characters

The algorithm first extracts all ASCII characters from the input and places them at the beginning of the output. These characters are left unchanged. For example, "cafe.com" would have "caf" extracted as the ASCII portion, with ".com" handled separately as the TLD.

Step 2: Encode Non-ASCII Characters

The remaining non-ASCII characters are encoded using a variable-length integer encoding scheme. Each integer represents a position value that indicates where the next non-ASCII character should be inserted into the existing string. The encoding uses a base-36 system (using digits 0-9 and letters a-z) with adaptive bias that adjusts the encoding range based on previously encoded values.

Step 3: Combine and Prefix

The encoded non-ASCII portion is appended to the ASCII portion after a hyphen separator. The entire result is prefixed with "xn--" to indicate Punycode encoding.

A Concrete Example

Input domain:   muller.de (u with umlaut)
ASCII part:     mller
Non-ASCII:      u umlaut (Unicode U+00FC)
Punycode:       xn--mller-kva
Full ACE:       xn--mller-kva.de

Input domain:   Chinese "hello" + .com
ASCII part:     (none)
Non-ASCII:      U+4F60, U+597D
Punycode:       xn--6qq986b3xl
Full ACE:       xn--6qq986b3xl.com

Practical Applications of Punycode

Domain Name Registration

Domain registrars use Punycode internally when registering internationalized domain names. When you search for a domain name in your native script, the registrar converts it to Punycode, checks availability against the DNS database, and registers the Punycode version. The registrar's interface may display the Unicode version for readability, but the actual registered domain is in Punycode.

Email Addresses

Internationalized email addresses follow the EAI (Email Address Internationalization) standard, which uses Punycode for the domain portion. The local part (before the @) can contain Unicode characters directly, but the domain part is encoded in Punycode.

URL Handling in Browsers

Web browsers automatically handle Punycode conversion. When you type a domain with international characters in the address bar, the browser converts it to Punycode for the actual DNS lookup and HTTP request. The browser then displays the Unicode version in the address bar for readability, unless security restrictions trigger display of the raw Punycode.

DNS Configuration

When configuring DNS records for internationalized domains, all zone files and DNS records must use the Punycode (ACE) form. This is because the DNS protocol only supports ASCII. DNS server software will reject records containing non-ASCII characters.

How to Convert Between Unicode and Punycode

Online Punycode Converter

Our free online Punycode converter lets you convert between Unicode domain names and Punycode instantly. Simply enter a domain name in any script, and it converts to Punycode. You can also paste Punycode to decode it back to Unicode.

Command Line Tools

# Using Python
python3 -c "print('example.de'.encode('idna').decode())"

# Using idn command (requires libidn tools)
idn --punycode-encode example.de

# Using PowerShell (with .NET)
[System.Net.IDN]::GetAscii("example.de")

Programming Languages

# Python
domain = "muller.de"
punycode = domain.encode("idna").decode()
# xn--mller-kva.de

# JavaScript (Node.js)
const punycode = require("punycode/");
punycode.toASCII("cafe.com");
// xn--caf-dma.com

# PHP
$punycode = idn_to_ascii("muller.de");
// xn--mller-kva.de

Punycode and Browser Security

Common Punycode Examples

Unicode Domain           Punycode (ACE)
muller.de                xn--mller-kva.de
cafe.com                 xn--caf-dma.com
aiti.com                 xn--iti-0qa.com
Chinese hello.com        xn--6qq986b3xl.com
Russian privet.ru        xn--80aalbc1b2c8c.ru
naive.com                xn--nave-6pa.com
Korean example.com       xn--2o0b7a0c.com

Best Practices for Working with International Domains

  1. Always register both versions. Register the internationalized domain AND the ASCII version to protect your brand and prevent confusion.
  2. Display Unicode, store Punycode. Show users the readable Unicode version, but always use Punycode for DNS records, server configuration, and database storage.
  3. Test thoroughly. Test your international domains across different browsers, email clients, and DNS providers, as support and behavior can vary.
  4. Be aware of homograph risks. If you operate a security-sensitive service, monitor for lookalike domains registered with similar Unicode characters.
  5. Use IDNA 2008. The latest version of the IDNA standard has better security properties than IDNA 2003. Ensure your tools support the updated standard.

Try our free online tool to get results instantly in your browser.

Frequently Asked Questions

What is Punycode?

Punycode is an encoding syntax that converts Unicode characters to ASCII for use in domain names. It allows internationalized domain names to work with the ASCII-only Domain Name System. Punycode-encoded domains start with 'xn--'.

Why do domains start with xn--?

The 'xn--' prefix is defined by the IDNA standard to indicate that the domain label is Punycode-encoded. This prefix was chosen to be extremely unlikely to appear naturally in domain names, making it easy for systems to identify and process internationalized domains.

Is Punycode safe from phishing attacks?

Punycode itself is neutral, but it enables homograph attacks where visually similar characters from different scripts are used to impersonate domains. Modern browsers mitigate this by displaying Punycode when mixed scripts are detected.

How do I convert a domain to Punycode?

Use our free online Punycode converter tool. In Python: domain.encode('idna').decode(). In JavaScript (Node.js): require('punycode/').toASCII('domain'). In PHP: idn_to_ascii('domain').

What is the difference between IDNA 2003 and IDNA 2008?

IDNA 2008 is the updated internationalized domain name standard with stricter security rules. It better handles characters that could be confused across scripts and has improved validation. Most modern systems should use IDNA 2008 for new implementations.