URL encoding (also known as Percent-Encoding) is one of the most fundamental yet error-prone aspects of web development. What seems like simply "turning special characters into %XX" involves the RFC 3986 standard, differences across programming languages, and numerous pitfalls.
This guide will start from the underlying principles and thoroughly explain every aspect of URL encoding, helping you avoid common encoding mistakes in real-world development.
URLs (Uniform Resource Locators) only allow a subset of the ASCII character set. Specifically, URL characters fall into two categories:
:/?#[]@!$&'()*+,;=-_.~When a URL needs to include characters outside this range (such as non-ASCII characters, spaces, or special symbols), URL encoding is required. The encoding rule is simple: convert each byte to hexadecimal and prefix it with a percent sign %.
For example:
"Hello" → "%E4%BD%A0%E5%A5%BD"
"hello world" → "hello%20world"
"a+b=c" → "a%2Bb%3Dc"
"a&b" → "a%26b"
URLs were originally designed to identify internet resources. Reserved characters have different meanings in different parts of a URL:
/ Separates path segments? Separates path from query parameters& Separates query parameters= Separates parameter names from values# Separates path from fragment identifier% Marks the start of an encoded characterIf user input contains these characters but they are not intended as delimiters, they must be encoded. Otherwise, URL parsing will break.
JavaScript provides three URL encoding-related functions. Their differences are among the most common interview questions and one of the most easily confused areas in real-world development.
encodeURI() is used to encode a complete URL. It does not encode URL reserved characters because they have special meaning in the URL structure.
encodeURI("https://example.com/path/file?q=test")
// → "https://example.com/%E8%B7%AF%E5%BE%84/%E6%96%87%E4%BB%B6?q=%E6%B5%8B%E8%AF%95"
// Note: ://? reserved characters are not encoded
encodeURI("https://example.com/search?q=hello world&lang=en")
// → "https://example.com/search?q=hello%20world&lang=en"
// Note: & and = are not encoded
encodeURIComponent() is used to encode URL components (such as query parameter values, path segments). It encodes all characters except letters, digits, and -_.!~*'().
encodeURIComponent("hello world")
// → "hello%20world"
encodeURIComponent("a&b=c")
// → "a%26b%3Dc"
// & and = are encoded!
encodeURIComponent("https://example.com")
// → "https%3A%2F%2Fexample.com"
// :// is also encoded! This is why you should not use it on complete URLs
| Character | encodeURI() | encodeURIComponent() |
|---|---|---|
| Alphanumeric | No encoding | No encoding |
| - _ . ! ~ * ' ( ) | No encoding | No encoding |
| ; / ? : @ & = + $ , # | No encoding | Encoded |
| Space | %20 | %20 |
| Non-ASCII | Encoded | Encoded |
encodeURIComponent() on a complete URL will encode :// as %3A%2F%2F, making the URL unusable. Conversely, using encodeURI() for parameter values may leave & and = unencoded, breaking the parameter structure.
escape() has been deprecated and should no longer be used. It does not encode characters like @*_+-./ and uses a non-standard encoding format (%uXXXX) for Unicode characters that does not comply with RFC standards.
This is the most common point of confusion in URL encoding. The rules are:
%20+ or %20%20++ and %20 are treated equivalently in query parameters. However, for clarity and standards compliance, we recommend consistently using %20.
For Unicode characters (Chinese, Japanese, emoji, etc.), URL encoding first converts the character to a UTF-8 byte sequence, then percent-encodes each byte:
"you" → UTF-8: E4 BD A0 → "%E4%BD%A0"
"♥" → UTF-8: E2 99 A5 → "%E2%99%A5"
"😀" → UTF-8: F0 9F 98 80 → "%F0%9F%98%80"
| Character | Description | URL Encoding |
|---|---|---|
| Space | Space | %20 |
| ! | Exclamation mark | %21 |
| " | Double quote | %22 |
| # | Hash/number sign | %23 |
| $ | Dollar sign | %24 |
| & | Ampersand | %26 |
| ' | Single quote | %27 |
| ( | Left parenthesis | %28 |
| ) | Right parenthesis | %29 |
| + | Plus sign | %2B |
| , | Comma | %2C |
| / | Slash | %2F |
| : | Colon | %3A |
| ; | Semicolon | %3B |
| = | Equals sign | %3D |
| ? | Question mark | %3F |
| @ | At sign | %40 |
| % | Percent sign | %25 |
Double encoding means data gets encoded twice, resulting in incorrect output after decoding. This typically happens when both frontend and backend encode without proper coordination:
// First encoding
encodeURIComponent("hello world") → "hello%20world"
// Second encoding (wrong!)
encodeURIComponent("hello%20world") → "hello%2520world"
// Decoding once gives "hello%20world" instead of "hello world"
Solution: Clearly define encoding responsibility boundaries — the frontend encodes parameter values once, and the backend decodes once upon receipt.
URL encoding behavior differs across languages and frameworks:
| Language/Framework | Encoding Function | Space Encoding |
|---|---|---|
| JavaScript | encodeURIComponent() | %20 |
| Python | urllib.parse.quote() | %20 |
| Python | urllib.parse.quote_plus() | + |
| Java | URLEncoder.encode() | + |
| PHP | rawurlencode() | %20 |
| PHP | urlencode() | + |
| Go | url.QueryEscape() | + |
| Go | url.PathEscape() | %20 |
+, while JavaScript's encodeURIComponent() encodes them as %20. If your frontend and backend use different languages, ensure encoding/decoding is consistent.
URL encoding is not just about making URLs valid — it is also an important defense against URL injection attacks:
URL encoding may seem simple, but it involves many details in real-world development. Remember the core principle: use encodeURI() for complete URLs, encodeURIComponent() for URL components. When you encounter encoding issues, first check whether double encoding occurred or the wrong encoding function was selected.