What Is Punycode?
Punycode is an encoding syntax designed to represent Unicode characters using only the ASCII character set. It was created to solve a fundamental problem with the Domain Name System (DNS): DNS only supports ASCII characters (letters A-Z, digits 0-9, and hyphens), but people around the world write in hundreds of scripts including Chinese, Arabic, Cyrillic, Devanagari, and many others. Punycode bridges this gap by converting internationalized domain names (IDNs) into an ASCII-compatible encoding (ACE) that DNS can handle.
The encoding is defined in RFC 3492 and is a component of the broader Internationalized Domain Names in Applications (IDNA) standard. When you register a domain with international characters, the domain name is internally converted to Punycode before it can be used with DNS. For example, "muller.de" becomes "xn--mller-kva.de" and Chinese characters for "hello" become "xn--6qq986b3xl.com".
The "xn--" prefix is a standard signal that indicates Punycode-encoded text. This prefix was chosen to be highly unlikely to appear in normal domain names, making it easy for systems to distinguish between regular ASCII domains and Punycode-encoded international domains.
Why Punycode Is Necessary
The Domain Name System was designed in the early 1980s, long before the internet became a global phenomenon. At the time, ASCII was considered sufficient for domain names because the internet was primarily used by English-speaking researchers and institutions. As the internet expanded globally, the ASCII-only limitation became a significant barrier to adoption.
The IDN Problem
Consider a business in Japan that wants its website to reflect its Japanese name. Without internationalization, the business would be forced to use a romanized version of its name, which may be unfamiliar to its Japanese customers. Similarly, a Russian company, an Arabic-speaking organization, or a Chinese brand all face the same issue. Punycode solves this by allowing any Unicode character to be encoded in an ASCII-compatible format.
The Homograph Attack
One significant security concern related to Punycode is the homograph attack. Because many Unicode characters look identical or nearly identical to ASCII characters, an attacker could register a domain that visually appears to be a legitimate domain but is actually different. For example, the Cyrillic letter "a" looks identical to the Latin letter "a," so "p aypal.com" (using Cyrillic a) could be registered to impersonate "paypal.com."
Modern browsers mitigate this risk by displaying Punycode-encoded domains in their raw form (showing "xn--" prefixes) when the domain contains characters from multiple scripts, or by restricting which scripts can be mixed in a single domain. This makes homograph attacks much more difficult to execute successfully.
How Punycode Encoding Works
Punycode uses a clever bootstring algorithm that separates the ASCII characters from the non-ASCII characters and encodes the non-ASCII characters as a sequence of variable-length integers appended after a hyphen.
Step 1: Separate ASCII Characters
The algorithm first extracts all ASCII characters from the input and places them at the beginning of the output. These characters are left unchanged. For example, "cafe.com" would have "caf" extracted as the ASCII portion, with ".com" handled separately as the TLD.
Step 2: Encode Non-ASCII Characters
The remaining non-ASCII characters are encoded using a variable-length integer encoding scheme. Each integer represents a position value that indicates where the next non-ASCII character should be inserted into the existing string. The encoding uses a base-36 system (using digits 0-9 and letters a-z) with adaptive bias that adjusts the encoding range based on previously encoded values.
Step 3: Combine and Prefix
The encoded non-ASCII portion is appended to the ASCII portion after a hyphen separator. The entire result is prefixed with "xn--" to indicate Punycode encoding.
A Concrete Example
Input domain: muller.de (u with umlaut) ASCII part: mller Non-ASCII: u umlaut (Unicode U+00FC) Punycode: xn--mller-kva Full ACE: xn--mller-kva.de Input domain: Chinese "hello" + .com ASCII part: (none) Non-ASCII: U+4F60, U+597D Punycode: xn--6qq986b3xl Full ACE: xn--6qq986b3xl.com
Practical Applications of Punycode
Domain Name Registration
Domain registrars use Punycode internally when registering internationalized domain names. When you search for a domain name in your native script, the registrar converts it to Punycode, checks availability against the DNS database, and registers the Punycode version. The registrar's interface may display the Unicode version for readability, but the actual registered domain is in Punycode.
Email Addresses
Internationalized email addresses follow the EAI (Email Address Internationalization) standard, which uses Punycode for the domain portion. The local part (before the @) can contain Unicode characters directly, but the domain part is encoded in Punycode.
URL Handling in Browsers
Web browsers automatically handle Punycode conversion. When you type a domain with international characters in the address bar, the browser converts it to Punycode for the actual DNS lookup and HTTP request. The browser then displays the Unicode version in the address bar for readability, unless security restrictions trigger display of the raw Punycode.
DNS Configuration
When configuring DNS records for internationalized domains, all zone files and DNS records must use the Punycode (ACE) form. This is because the DNS protocol only supports ASCII. DNS server software will reject records containing non-ASCII characters.
How to Convert Between Unicode and Punycode
Online Punycode Converter
Our free online Punycode converter lets you convert between Unicode domain names and Punycode instantly. Simply enter a domain name in any script, and it converts to Punycode. You can also paste Punycode to decode it back to Unicode.
Command Line Tools
# Using Python
python3 -c "print('example.de'.encode('idna').decode())"
# Using idn command (requires libidn tools)
idn --punycode-encode example.de
# Using PowerShell (with .NET)
[System.Net.IDN]::GetAscii("example.de")
Programming Languages
# Python
domain = "muller.de"
punycode = domain.encode("idna").decode()
# xn--mller-kva.de
# JavaScript (Node.js)
const punycode = require("punycode/");
punycode.toASCII("cafe.com");
// xn--caf-dma.com
# PHP
$punycode = idn_to_ascii("muller.de");
// xn--mller-kva.de
Punycode and Browser Security
- Mixed-script detection: Browsers check if a domain contains characters from scripts that could be confused with Latin characters. If so, the domain is displayed in Punycode form to reveal the true characters.
- Whole-script domains: If all characters in a domain belong to the same non-Latin script (like all Chinese or all Arabic), browsers display it in Unicode for readability. This is safe because the characters cannot be confused with Latin characters.
- Top-level domain restrictions: Some browsers restrict which scripts can be used with certain TLDs to prevent impersonation.
- User-selectable behavior: Some browsers allow users to choose whether to always display Punycode or to use the default mixed-script detection behavior.
Common Punycode Examples
Unicode Domain Punycode (ACE) muller.de xn--mller-kva.de cafe.com xn--caf-dma.com aiti.com xn--iti-0qa.com Chinese hello.com xn--6qq986b3xl.com Russian privet.ru xn--80aalbc1b2c8c.ru naive.com xn--nave-6pa.com Korean example.com xn--2o0b7a0c.com
Best Practices for Working with International Domains
- Always register both versions. Register the internationalized domain AND the ASCII version to protect your brand and prevent confusion.
- Display Unicode, store Punycode. Show users the readable Unicode version, but always use Punycode for DNS records, server configuration, and database storage.
- Test thoroughly. Test your international domains across different browsers, email clients, and DNS providers, as support and behavior can vary.
- Be aware of homograph risks. If you operate a security-sensitive service, monitor for lookalike domains registered with similar Unicode characters.
- Use IDNA 2008. The latest version of the IDNA standard has better security properties than IDNA 2003. Ensure your tools support the updated standard.
Try our free online tool to get results instantly in your browser.
Frequently Asked Questions
What is Punycode?
Punycode is an encoding syntax that converts Unicode characters to ASCII for use in domain names. It allows internationalized domain names to work with the ASCII-only Domain Name System. Punycode-encoded domains start with 'xn--'.
Why do domains start with xn--?
The 'xn--' prefix is defined by the IDNA standard to indicate that the domain label is Punycode-encoded. This prefix was chosen to be extremely unlikely to appear naturally in domain names, making it easy for systems to identify and process internationalized domains.
Is Punycode safe from phishing attacks?
Punycode itself is neutral, but it enables homograph attacks where visually similar characters from different scripts are used to impersonate domains. Modern browsers mitigate this by displaying Punycode when mixed scripts are detected.
How do I convert a domain to Punycode?
Use our free online Punycode converter tool. In Python: domain.encode('idna').decode(). In JavaScript (Node.js): require('punycode/').toASCII('domain'). In PHP: idn_to_ascii('domain').
What is the difference between IDNA 2003 and IDNA 2008?
IDNA 2008 is the updated internationalized domain name standard with stricter security rules. It better handles characters that could be confused across scripts and has improved validation. Most modern systems should use IDNA 2008 for new implementations.