URL encoding (also called percent encoding) is one of the most fundamental concepts in web development, yet it's also one of the most common sources of bugs. A single unencoded space or ampersand can break a link, corrupt a form submission, or cause an API call to fail silently. Understanding how URL encoding works — and when to apply it — is essential for anyone building for the web.
This guide explains what URL encoding is from the ground up, walks through which characters need encoding and why, shows practical examples in multiple programming languages, and covers the specific encoding challenges that arise in API development.
URL encoding is a mechanism for converting characters into a format that can be safely transmitted over the internet. The basic idea is simple: any character that isn't part of the "safe" ASCII character set gets replaced with a percent sign (%) followed by two hexadecimal digits representing the character's byte value.
For example, a space character becomes %20, an ampersand becomes %26, and a Chinese character like "中" becomes %E4%B8%AD (in UTF-8 encoding). This transformation ensures that URLs only contain characters that are universally safe for transmission through web infrastructure — browsers, servers, proxies, and firewalls.
URLs were originally designed to use a limited subset of ASCII characters. The URL specification (RFC 3986) defines which characters are allowed and what they mean in different parts of a URL. Characters like ?, &, =, /, and # have special structural meaning in URLs — they separate query parameters, denote paths, and mark fragments.
When you need to include these characters as literal data (not structural elements), they must be encoded. Without encoding, the server can't distinguish between a structural & that separates parameters and a literal & that's part of a search query.
RFC 3986 categorizes URL characters into three groups:
These characters are always safe in URLs and should never be encoded:
A-Z a-z 0-9 - _ . ~
That's 66 characters total. Everything else is a candidate for encoding, depending on context.
These characters have special meaning in URL structure. They must be encoded when used as literal data, but left unencoded when serving their structural purpose:
| Character | URL Purpose | Encoded Form |
|---|---|---|
| ! | Sub-delimiter | %21 |
| # | Fragment identifier | %23 |
| $ | Sub-delimiter | %24 |
| & | Query parameter separator | %26 |
| ' | Sub-delimiter | %27 |
| ( | Sub-delimiter | %28 |
| ) | Sub-delimiter | %29 |
| * | Sub-delimiter | %2A |
| + | Space (in query strings) | %2B |
| , | Sub-delimiter | %2C |
| / | Path separator | %2F |
| : | Port / scheme separator | %3A |
| ; | Sub-delimiter | %3B |
| = | Query key-value separator | %3D |
| ? | Query string start | %3F |
| @ | Authority separator | %40 |
| [ | IPv6 literal | %5B |
| ] | IPv6 literal | %5D |
Any character outside the ASCII range (code points above 127) must be encoded. This includes accented characters (é, ñ, ü), CJK characters (中, 日, 한), emojis (😀), and symbols (©, €, £). These are first converted to bytes using UTF-8 encoding, then each byte is percent-encoded.
For example, the Euro sign € (U+20AC) becomes %E2%82%AC in UTF-8 — three bytes, each percent-encoded.
The most common use case for URL encoding is preparing query parameters. Consider a search for "web & API development":
Raw: https://example.com/search?q=web & API development
Encoded: https://example.com/search?q=web%20%26%20API%20development
Without encoding, the & would be interpreted as a parameter separator, splitting the query into q=web and API development (a separate, malformed parameter). The space would also cause issues — while some servers accept + for spaces in query strings, %20 is universally safe.
Characters in the path portion of a URL are encoded differently than characters in query strings. Specifically, / must be encoded in path segments (since it separates path components) but + is treated as a literal plus sign (not a space) in paths.
Raw: https://example.com/files/web & API guide.pdf
Encoded: https://example.com/files/web%20%26%20API%20guide.pdf
When HTML forms submit data with application/x-www-form-urlencoded, all form values are URL-encoded. This is why your browser automatically converts spaces to + and encodes special characters when you submit a form.
// Modern approach (recommended)
const encoded = encodeURIComponent('hello world & foo=bar');
// Result: "hello%20world%20%26%20foo%3Dbar"
// Full URL encoding (includes reserved characters)
const fullEncoded = encodeURI('https://example.com/path?q=test');
// Result: "https://example.com/path?q=test"
// Decoding
const decoded = decodeURIComponent('hello%20world%20%26%20foo%3Dbar');
// Result: "hello world & foo=bar"
encodeURIComponent() for individual parameter values, not entire URLs. Use encodeURI() for full URLs — it leaves :/?# unencoded since they're structural. Mixing these up is the single most common URL encoding bug.
from urllib.parse import quote, quote_plus, unquote
# Encode (spaces as %20)
quote('hello world & foo=bar')
# Result: 'hello%20world%20%26%20foo%3Dbar'
# Encode (spaces as +)
quote_plus('hello world & foo=bar')
# Result: 'hello+world+%26+foo%3Dbar'
# Decode
unquote('hello%20world%20%26%20foo%3Dbar')
# Result: 'hello world & foo=bar'
// Encode
$encoded = urlencode('hello world & foo=bar');
// Result: "hello+world+%26+foo%3Dbar"
// Raw encode (spaces as %20)
$raw = rawurlencode('hello world & foo=bar');
// Result: "hello%20world%20%26%20foo%3Dbar"
// Decode
$decoded = urldecode('hello+world+%26+foo%3Dbar');
// Result: "hello world & foo=bar"
URL encoding is critical in API development because APIs frequently handle user-generated data that contains special characters. Here are the key scenarios where encoding matters:
When building API request URLs with user input, always encode parameter values:
// WRONG — user input with & breaks the URL
const url = `https://api.example.com/search?q=${userInput}`;
// RIGHT — encode the value
const url = `https://api.example.com/search?q=${encodeURIComponent(userInput)}`;
Without encoding, a search for "Tom & Jerry" would generate ?q=Tom & Jerry, which the API would parse as two parameters: q=Tom and Jerry.
Some APIs pass authentication tokens in URL query parameters. These tokens often contain characters like +, /, and = (especially Base64-encoded tokens) that must be properly encoded:
const token = 'abc+def/ghi=';
const url = `https://api.example.com/data?token=${encodeURIComponent(token)}`;
// Result: ?token=abc%2Bdef%2Fghi%3D
One of the most frustrating bugs in API development is double encoding — encoding a value that's already been encoded. This happens when multiple layers of your application each encode the URL:
// First encoding: space → %20
const first = encodeURIComponent('hello world');
// Result: "hello%20world"
// Second encoding (BUG!): % → %25, 2 → 2, 0 → 0
const second = encodeURIComponent(first);
// Result: "hello%2520world"
// The server now receives "hello%20world" as literal text
// instead of decoding it to "hello world"
The fix is simple: encode once, at the point where you construct the URL. Don't encode values that have already been encoded by a library or middleware.
REST APIs often include identifiers in the URL path. These path segments must also be encoded:
// Resource name with spaces and special characters
const name = 'project alpha v2.0';
const url = `https://api.example.com/projects/${encodeURIComponent(name)}`;
// Result: https://api.example.com/projects/project%20alpha%20v2.0
When your API returns URLs as part of JSON responses, ensure those URLs are properly encoded before serialization. Many frameworks handle this automatically, but if you're constructing URLs manually in response payloads, apply encoding consistently.
encodeURIComponent() on a full URL encodes the :// and / that should remain structural. Encode individual values, not complete URLs.+ represents a space. In path segments, + is a literal plus sign. Use %20 when you need to be unambiguous.+ (like in a phone number: +1-555-1234), failing to encode it will cause the server to decode it as a space.decodeURIComponent() / unquote() before using the values.enctype="application/x-www-form-urlencoded", the browser already encodes the values. Encoding them again in JavaScript before submission causes double encoding.URL encoding (percent encoding) converts individual unsafe characters into %XX sequences while leaving safe characters readable. Base64 encoding converts the entire input into a different character set (A-Z, a-z, 0-9, +, /) using a 3-to-4 byte mapping. URL encoding preserves readability for most of the string; Base64 makes the entire string unreadable. They serve different purposes — URL encoding makes strings safe for URLs, while Base64 converts binary data into a text-safe format.
Use encodeURIComponent() for individual parameter values or path segments. It encodes everything that could be unsafe, including /, ?, and &. Use encodeURI() when you have a complete URL and only need to encode the non-structural parts — it preserves :/?#[]@!$&'()*+,;= since those serve structural purposes in URLs.
This is double encoding. Your value was encoded twice: first the space became %20, then the % in %20 was encoded to %25, producing %2520. Fix this by identifying where the extra encoding step occurs — usually in a middleware layer, a form handler, or manual concatenation — and remove the redundant encoding.
Emojis are encoded the same way as any non-ASCII character. The emoji 😀 (U+1F600) becomes %F0%9F%98%80 in UTF-8 percent encoding. All modern browsers and server libraries handle this automatically. Use encodeURIComponent('😀') in JavaScript or quote('😀') in Python.
Search engines handle URL-encoded characters fine and will index the decoded versions. However, for SEO best practices, prefer human-readable URLs with hyphens instead of encoded characters. Use example.com/products/red-shoes rather than example.com/products/red%20shoes. Most web frameworks provide "slug" functions that convert text into URL-safe, SEO-friendly paths.
In a URL path, / must be encoded since it separates path segments, but + is a literal character. In a query string, + represents a space (by convention), and / is generally safe as a literal character. Both contexts require encoding &, =, ?, and # when used as data.
Online URL encoder/decoder tools run JavaScript in your browser to apply encodeURIComponent() and decodeURIComponent() to your input. The encoding and decoding happen entirely on your device — no data is sent to a server. They're useful for quick checks when you're debugging a URL or verifying that your application's encoding is correct.
No. URL encoding (percent encoding) makes characters safe for URLs using %XX sequences. HTML encoding makes characters safe for HTML markup using named entities (&) or numeric entities (&). They address different problems — one for web addresses, the other for HTML document structure. A fully web-safe application applies both in the appropriate contexts.
URL encoding seems simple on the surface, but the details matter. Encode once, encode at the right layer, and always encode user-supplied values before inserting them into URLs. These three rules will prevent the vast majority of encoding-related bugs in web applications and APIs.