HTML Entities & Special Characters: Complete Reference

HTML documents are parsed as text, but certain characters have special meaning within HTML syntax. Angle brackets define tags, ampersands begin escape sequences, and quotation marks delimit attribute values. To display these characters as visible content rather than HTML syntax, web developers must use HTML entities—special coded representations that render as the desired character. Understanding HTML entities is essential for any web developer, content creator, or anyone working with HTML text.

Why HTML Entities Exist

The fundamental issue is that HTML uses certain characters for its own syntax. The less-than sign (<) opens an HTML tag, and the greater-than sign (>) closes one. If you want to display "3 < 5" as text, the browser interprets <5 as a tag rather than text. HTML entities solve this by providing alternative representations for reserved characters that do not trigger HTML parsing.

Ampersands have special meaning too—they begin numeric and named character references. An ampersand followed by alphanumeric characters is interpreted as an entity reference (like & for &) or a numeric reference. To display a literal ampersand in HTML, you must escape it. Unescaped ampersands that browsers interpret as entity starts cause rendering issues and malformed markup.

Quote characters present another challenge. Double quotes (") and single quotes (') delimit attribute values in HTML. To include a literal quote inside an attribute value, you must escape it. Outside of attributes, quotes render as text but using straight quotes inconsistently with curly quotes in content creates typographic confusion.

Named HTML Entities

Named entities use symbolic names in the format &name;, where name is a defined entity name. The most essential is & for the ampersand itself (&). Others include < for less-than (<), > for greater-than (>), " for double quote ("), and ' for single quote ('). These five entities handle the most critical escaping needs for any HTML document.

Typography entities improve the appearance of web content.   represents a non-breaking space that prevents line breaks at that position. &copy; displays the copyright symbol (©). &reg; displays the registered trademark symbol (®). &trade; displays the trademark symbol (™). &mdash; displays an em dash (—), and &ndash; displays an en dash (–). These entities enable proper typographic presentation.

Accent and special letter entities enable display of international characters: &aacute; for á, &ntilde; for ñ, &ouml; for ö, and so on. There are entities for virtually every accented character in Western European languages. However, using UTF-8 encoding for your HTML documents is generally preferable to entity-based internationalization, as it handles all characters consistently without entity proliferation.

Numeric Character References

Numeric references provide an alternative to named entities using either decimal (&#nnn;) or hexadecimal (&#xhhh;) format, where nnn or hhh is the character's code point in Unicode. Decimal &#60; and hexadecimal &#x3C; both represent the less-than sign (<), equivalent to &lt;. This format is useful for characters without named entities.

Unicode code points cover all characters in all writing systems. The copyright symbol © is U+00A9, which you can reference as &#169; in decimal or &#x00A9; in hexadecimal. Emoji also have Unicode code points: the smiling face emoji is U+1F600, referenceable as &#128512; or &#x1F600;. While emoji typically render fine as literal characters in UTF-8 documents, numeric references work when direct encoding is problematic.

Numeric references are particularly useful for invisible or control characters that you need to display for debugging or documentation purposes. Characters like the byte order mark (BOM) or zero-width joiners that would otherwise be invisible can be represented numerically to make them visible. They also ensure characters display correctly even when character encoding declarations are missing or incorrect.

Common HTML Entity Mistakes

Unescaped ampersands are the most common HTML entity error. A URL like "example.com/page?id=1&lang=en" is correct, but "example.com/page?id=1&lang=en" is technically malformed because &lang looks like an HTML entity. While browsers often render this correctly by treating "lang" as unknown entity name, it is invalid HTML and can cause issues in some contexts. Always escape ampersands in HTML content.

Double encoding occurs when entities are encoded that should not be. If you write &lt; in your source, it renders as "<". But if your content management system then encodes that as &amp;lt;, it renders as "<" instead of "<". This typically happens when content passes through multiple encoding layers. Always encode at the right stage—once at output, not multiple times.

Confusing HTML entities with URL encoding is a common source of errors. HTML entities encode characters for HTML document parsing, while URL encoding (percent-encoding) encodes characters for URL transmission. The space character is encoded as &nbsp; in HTML but as %20 in URLs. Using HTML entities in URLs instead of percent-encoding breaks the URL.

When to Use HTML Entities

In HTML source code, always escape the five reserved characters: < > & " '. Use entities for these characters whenever they appear as content, not as HTML syntax. For attribute values, escape at minimum the quote character used to delimit the attribute. This ensures your HTML is valid and renders correctly regardless of context.

For typographic content, entities for dashes, quotes, and other special characters improve presentation. However, if you use UTF-8 encoding throughout, you can often write these characters directly rather than using entities. The exception is non-breaking spaces and other formatting entities that have semantic meaning beyond display appearance.

For international characters, UTF-8 encoding is generally preferable to numeric or named entities. Writing "café" directly is more maintainable and readable than "caf&eacute;" or "caf&#233;". Reserve entities for the five reserved characters and any specific special characters where entity usage provides clarity or semantic value.

Conclusion

HTML entities are fundamental to correct HTML authoring. Understanding why they exist, when to use them, and how to use them correctly will help you write valid, robust HTML that renders consistently across all browsers and contexts. Remember the five essential entities (lt, gt, amp, quot, apos), escape ampersands consistently, use UTF-8 encoding to minimize entity usage for international characters, and always verify your HTML using validation tools. Good entity usage is invisible to users but critical for correct rendering.

← Back to ArticlesNext Article →