Finding HTML Injection Vulns, Part I
Last updated on: September 6, 2020
Text and strings form the building blocks of web apps. Developers and content creators mix text with other media, code, and HTML to produce all kinds of apps for our browsers. However, when developers mix text with code or they carelessly place strings inside of HTML they expose the app to one of the most common web-related vulns: HTML Injection, a.k.a. Cross-Site Scripting (XSS). One way this happens is when developers use string concatenation to piece together a web page with static HTML and user-supplied data. For example, think of a site’s search function. When you submit a search request, the site responds with something like, "Here are the results for XYZ," and lists whatever might have matched. HTML injection occurs when the search term contains markup instead of simple text, and the app treats it like this:
<span>Here are the results for "<script>alert(9)</script>"</span>
Security researchers have discussed and demonstrated HTML injection vulns since the HTML spec’s first draft roughly 20 years ago. The root cause of the problem hasn’t changed much, but the techniques for exploiting it have. Early examples of HTML injection and XSS talked about stealing session IDs from the document.cookie object, or showed how to steal passwords from a login form. Today’s exploits leverage HTML5 features and have been integrated into sophisticated exploit frameworks. HTML injection vulns infect all kinds of sites. They have appeared in search engines, social media, banks, web-based email, even security companies. (Even a book about web security can cause web security problems.) Sometimes the flaws are so obvious that you have to wonder how developers missed the problem in the first place. The vulns seem easy to find, but the process is tedious and time consuming. In other words, it’s an ideal candidate for automation.
We designed WAS to accurately identify several types of HTML injection flaws. The easiest one to start with is called reflected XSS. This happens when the web app receives a request with a test payload and responds with HTML that contains the payload written in a way that changes the document’s structure. The reflected search term we mentioned previously is a prime example of this.
<input type="text" name="email" value="<script>alert(9)</script>">
Some web apps try naive (and ultimately futile) countermeasures like looking for "typical" attacks that have words like "<script>" or "alert" within them. Most of the time it’s possible to bypass such weak filters by slightly altering the payload. In fact, just being able to create a nonsense tag like <abcd> indicates the app isn’t handling user-supplied data securely; it’d be a bug that should be fixed. So, the scanner goes through various payloads to see if one might work; it doesn’t just stop at the first failure.
But if this was where we stopped testing for HTML injection we’d miss a huge amount of possible vulns. In the next part we’ll look at how a scanner avoids false negatives by paying attention to detail and using techniques other than just checking single request/response pairs.