What are HTML entities and why do they exist?

HTML entities are character references used to represent special characters in HTML that would otherwise be interpreted as markup. For example, < is written as < because < has special meaning in HTML (it starts a tag). Entities also let you display characters not available on your keyboard, like © (copyright) or emoji.

Does HTML entity encoding prevent XSS attacks?

Encoding the five critical characters ( , &, ", ') prevents most reflected and stored XSS attacks by ensuring user input is treated as text, not as HTML markup. However, XSS prevention requires a broader strategy including Content Security Policy, input validation, and context-aware encoding (different rules for HTML attributes, JavaScript, CSS, and URLs).

What's the difference between named entities, decimal, and hexadecimal references?

Named entities use predefined names like & for & and < for <. Decimal references use the Unicode code point: < is <. Hexadecimal references use the code point in hex: < is <. All three produce identical output. Named entities are more readable; numeric references can represent any Unicode character.

Do I need to encode text inside <script> or <style> tags?

No—inside script and style elements, HTML entity encoding is not processed by the HTML parser. The content is passed directly to the JavaScript or CSS parser. This means you cannot use < inside a script tag to represent <. For security in script tags, you need different approaches like JSON encoding for embedded data.

When should I encode for HTML attributes vs. text content?

For text content, encode , and &. For attributes enclosed in double quotes, additionally encode ". For attributes in single quotes, additionally encode '. The safest approach is to always encode all five characters: , &, ", and '—this works correctly regardless of context.

HTML Entity Encoder: Why You Need It and How It Works

Q: Does HTML entity encoding prevent XSS attacks?

Encoding the five critical characters ( , &, ", ') prevents most reflected and stored XSS attacks by ensuring user input is treated as text, not as HTML markup. However, XSS prevention requires a broader strategy including Content Security Policy, input validation, and context-aware encoding (different rules for HTML attributes, JavaScript, CSS, and URLs).

Q: What's the difference between named entities, decimal, and hexadecimal references?

Named entities use predefined names like & for & and < for <. Decimal references use the Unicode code point: < is <. Hexadecimal references use the code point in hex: < is <. All three produce identical output. Named entities are more readable; numeric references can represent any Unicode character.

Q: When should I encode for HTML attributes vs. text content?

For text content, encode , and &. For attributes enclosed in double quotes, additionally encode ". For attributes in single quotes, additionally encode '. The safest approach is to always encode all five characters: , &, ", and '—this works correctly regardless of context.

📅 April 12, 2026 ⏱️ 9 min read 📝 Developer Tools

HTML entity encoding is one of those fundamental web concepts that every developer encounters but few truly understand. You know you need to escape special characters when outputting user content. You know it's important for security. But do you know why it works, which characters to encode in which contexts, and what happens when you get it wrong?

This guide dives deep into HTML entity encoding—covering the mechanics, the security implications, and the practical patterns you need in real applications.

Need to encode HTML entities quickly? Our tool runs entirely in your browser.

Try Risetop HTML Entity Encoder →

What Are HTML Entities?

HTML entities (also called character references) are a way to represent characters that have special meaning in HTML or that can't be typed directly. They use the ampersand (&) as an escape character, similar to how backslash works in many programming languages.

There are three forms:

Named entities: & < > " ©
Decimal references: & < ©
Hexadecimal references: & < ©

All three forms produce identical output in the browser. Named entities are more readable for common characters. Numeric references can represent any Unicode character.

The Five Critical Characters

When encoding for HTML, there are five characters that must always be handled:

Character	Named Entity	Decimal	Hex	Why
&	&	&	&	Starts all entity references
<	<	<	<	Starts HTML tags
>	>	>	>	Closes HTML tags
"	"	"	"	Delimits attribute values
'	'	'	'	Delimits attribute values

The ampersand is the most critical because it's the escape character itself. An unescaped & followed by certain character sequences can produce unexpected rendering or break the HTML parser.

How the HTML Parser Handles Entities

Understanding the parsing process explains why entity encoding is necessary and why it prevents XSS:

Tokenization

The HTML parser works in phases. During tokenization, it reads characters one at a time and builds tokens (start tags, end tags, text, comments). When it encounters &, it enters a special state looking for a character reference:

// Parser encounters: <script>alert('xss')</script>
// Tokenized as TEXT token: <script>alert('xss')</script>
// Rendered as visible text: 
// No script executes ✅

// Parser encounters: 
// Tokenized as: START_TAG(script) + TEXT(alert('xss')) + END_TAG(script)
// Script executes ❌

Entity encoding converts dangerous characters into a form that the parser treats as text content rather than markup. This is the fundamental mechanism of XSS prevention through output encoding.

Context Matters: Where You're Inserting Data

HTML entity encoding alone doesn't prevent all XSS. You need the right encoding for the right context:

HTML Body (Text Content)

<!-- Safe: encoded user data in text content -->
<p>Hello, <span>&lt;script&gt;</span>!</p>
<!-- Renders as: Hello,