Merchants are getting ready for the upcoming changes to the internal scanning requirements for PCI compliance. This blog post provides a checklist on what you should have ready and will review some of the tools Qualys provides for these requirements.
There are four core areas to focus on in preparation for your compliance to PCI 11.2, taking into account the changes from PCI 6.2 regarding risk ranking of vulnerabilities.
Your documented PCI scope (cardholder dataenvironment)
Your documented risk ranking process
Your scanning tools
Your scan reports
Merchants will need to complete each of these elements to be prepared to pass PCI compliance.
1. Your documented PCI scope (cardholder data environment)
All PCI requirements revolve around a cross-section of assets in your IT infrastructure that is directly involved in storage, processing, or transmitting payment card information. These IT assets are known as the cardholder data environment (CDE), and are the focus areas of the PCI DSS requirements.
These assets can exist in internal or external (public) networks and may be subject to different requirements based on what role they play in payment processing. These assets can be servers, routers, switches, workstations, databases, virtual machines or web applications; PCI refers to these assets as system components.
QualysGuard provides a capability to tag assets under management. The screenshot below shows an example of PCI scope being defined within the QualysGuard Asset Tagging module. It provides the ability to group internal assets (for 11.2.1), external assets (for 11.2.2), and both internal and external assets together (for 11.2.3).
This allows you to maintain documentation of your CDE directly, and to drive your scanning directly from your scope definition.
2. Your documented risk ranking process
This is the primary requirement associated with the June 30th deadline; this is the reference that should allow someone to reproduce your risk rankings for specific vulnerabilities.
The requirement references industry best practices, among other details, to consider in developing your risk ranking. It may help you to quickly adopt a common industry best practice and adapt it to your own environment. Two examples are the Qualys severity rating system, which is the default rating as per the security research team at Qualys; or, the PCI ASV Program Guide, which includes a rating system used by scanning vendors to complete external scanning. QualysGuard is used by 50 of the Forbes Global 100, and spans all market verticals; it qualifies as an industry best practice. Additionally, the QualysGuard platform is used by the majority of PCI Approved Scanning Vendors and already delivers rankings within the PCI ASV Program Guide practices.
The core rules of your risk rankings should take into account CVSS Base Scores, available from nearly all security intelligence feeds. These scores are also the base system used within the PCI ASV Program Guide. Your process should also account for system components in your cardholder data environment and vendor-provided criticality rankings, such as the Microsoft patch ranking system if your CDE includes Windows-based system components.
The process should include documentation that details the sources of security information you follow, how frequently you review the feeds, and how you respond to new information in the feeds. QualysGuard provides daily updates to the vulnerability knowledgebase and now offers a Zero-Day Analyzer service, which leverages data from the iDefense security intelligence feed.
3. Your scanning tools
After you have your scope clearly defined and you have your process for ranking vulnerabilities documented, you will need to be able to run vulnerability scans. This includes internal VM scans, external VM scans, PCI ASV scans (external), internal web application scans and external web application scans. It is thefindings in these scans that will map against your risk ranking process and allow you to produce the necessary scan reports.
You will need to be able to configure your scanning tools to check for “high” vulnerabilities, which will allow you to allocate resources to fix and resolve these issues as part of the normal vulnerability management program and workflow within your environment.
QualysGuard VM, QualysGuard WAS and QualysGuard PCI all work together seamlessly to provide each of these scans capabilities against the same group of assets that represent your PCI scope or CDE.
4. Your scan reports
You will want to produce reports for your internal PCI scope, as defined in #1 of this checklist, both quarterly and after any significant changes. If you have regular releases or updates to your IT infrastructure, you will want to have scan reports from those updates and upgrades. Quarterly scan reports need to be spaced apart by 90 days. In all cases, these reports need to show that there are no “high” vulnerabilities detected by your scanning tools.
Each report for the significant change events will also need to include external PCI scope. QualysGuard VM makes it easy to include both internal and external assets in the same report. QualysGuard VM also provides a direct link to your QualysGuard PCI merchant account for automation of your PCI ASV scan requirements.
QualysGuard WAS allows you to quickly meet your production web application scanning requirement (PCI 6.6) as well as internal web application scanning as part of your software development lifecycle (SDLC), by scanning your applications in development and in test.
If you follow these guidelines you will be well prepared to perform and maintain the required controls for PCI 11.2.
Imagine a line at a fast food restaurant that serves two types of burgers, and a customer at the cashier is stuck for a while deciding what he wants to order, making the rest of the line anxious, slowing down the business. Now imagine a line at the same restaurant, but with a sign saying “think ahead of your order,” which is supposed to speed things up. But now the customer orders hundreds of burgers, pays, and the line is stuck again, because he can take only 5 burgers at time to his car, making signs ineffective.
While developing the slowhttptest tool, I thought about this burger scenario, and became curious about how HTTP servers react to slow consumption of their responses. There are so many conversations about slowing down requests, but none of them cover slow responses. After spending a couple of evenings implementing proof-of-concept code, I pointed it to my so-many-times-tortured Apache server and, surprisingly, got a denial of service as easily as I got it with slowloris and slow POST.
Let me remind you what slowloris and slow POST are aiming to do: A Web server keeps its active connections in a relatively small concurrent connection pool, and the above-mentioned attacks try to tie up all the connections in that pool with slow requests, thus causing the server to reject legitimate requests, as in first reastaurnt scenario.
The idea of the attack I implemented is pretty simple: Bypass policies that filter slow-deciding customers, send a legitimate HTTP request and read the response slowly, aiming to keep as many connections as possible active. Sounds too easy to be true, right?
Crafting a Slow Read
Let’s start with a simple case, and send a legitimate HTTP request for a resource without reading the server’s response from the kernel receive buffer.
We craft a request like the following:
GET /img/delivery.png HTTP/1.1
User-Agent: Opera/9.80 (Macintosh; Intel Mac OS X 10.7.0; U; Edition MacAppStore; en) Presto/2.9.168 Version/11.50
And the server replies with something like this:
HTTP/1.1 200 OK
Date: Mon, 19 Dec 2011 00:12:28 GMT
Last-Modified: Thu, 08 Dec 2011 15:29:54 GMT
For those who don’t feel like reading tcpdump’s output: We established a connection; sent the request; received the response through several TCP packets sized 1448 bytes because of Maximum Segment Size that the underlying communication channel supports; and finally, 5 seconds later, we received the TCP packet with the FIN flag.
Everything seems normal and expected. The server handed the data to its kernel level send buffer, and the TCP/IP stack took care of the rest. At the client, even while the application had not read yet from its kernel level receive buffer, all the transactions were completed on the network layer.
What if we try to make the client’s receive buffer very small?
We sent the same HTTP request and server produced the same HTTP response, but tcpdump produced much more interesting results:
In the initial SYN packet, the client advertised its receive window size as 28 bytes. The server sends the first 28 bytes to the client and that’s it! The server keeps polling the client for space available at progressive intervals until it reaches a 2-minute interval, and then keeps polling at that interval, but keeps receiving win 0.
This is already promising: if we can prolong the connection lifetime for several minutes, it’s not that bad. And we can have more fun with thousands of connections! But fun did not happen. Let’s see why: Once the server received the request and generated the response, it sends the data to the socket, which is supposed to deliver it to the end user. If the data can fit into the server socket’s send buffer, the server hands the entire data to the kernel and forgets about it. That’s what happened with our last test.
What if we make the server keep polling the socket for write readiness? We get exactly what we wanted: Denial of Service.
Let’s summarize the prerequisites for the DoS:
We need to know the server’s send buffer size and then define a smaller-sized client receive buffer. TCP doesn’t advertise the server’s send buffer size, but we can assume that it is the default value, which is usually between 65Kb and 128Kb. There’s normally no need to have a send buffer larger than that.
We need to make the server generate a response that is larger than the send buffer. With reports indicating the Average Web Page Approaches 1MB, that should be fairly easy. Load the main page of the victim’s Web site in your favorite WebKit-based browser like Chrome or Safari and pick the largest resource in Web Inspector.
If there are no sufficiently large resources on the server, but it supports HTTP pipelining, which many Web servers do, then we can multiply the size of the response to fill up the server’s send buffer as much as we need by re-requesting same resource several times using the same connection.
For example, here’s a screenshot of mod_status on Apache under attack:
As you can see, all connections are in the WRITE state with 0 idle workers.
Here is the chart generated by the latest release of slowhttptest with Slow Read attack support:
Even though the TCP/IP stack shouldn’t make decisions on resetting alive and responsive connections, and it is the “userland” application’s responsibility to do so, I assume that some TCP/IP implementations or firewalls might have timers to track connections that cannot send data for some time. To avoid triggering such decisions, slowhttptest can read data from the local receive buffer very slowly to make the TCP/IP stack reply with ACKs with window size other than 0, thus ensuring some physical data flow from server to client.
While I was implementing the attack, I contacted Ivan Ristic to get his opinion and suggestions. I was suspecting there would be attacks exploiting zero/small window, but I thought I am the first one to apply it to tie up an HTTP server. I was surprised when Ivan replied with links to sockstress by Outpost24 and Nkiller2 exploiting Persist Timer infiniteness that are already covering most aspects I wanted to describe. However, the above mentioned techniques are crafting TCP packets and use raw sockets, whereas slowhttptest uses only the TCP sockets API to achieve almost the same functionality.
We still think it’s worthwhile to have a configurable tool to help people focus and design defense mechanisms, since this vulnerability still exists on many systems three years after it was first discovered, and I consider Slow Read DoS attacks are even lower profile and harder to detect than slowloris and slow POST attacks.
All servers I observed (Apache, nginx, lighttpd, IIS 7.5) are vulnerable in their default configuration.
The fundamental problem here is how servers are handling write readiness for active sockets.
The best protection would be:
Do not accept connections with abnormally small advertised window sizes
Do not enable persistent connections and HTTP pipelining unless performance really benefits from it
Limit the absolute connection lifetime to some reasonable value
Some servers have built-in protection, which is turned off by default. For example, lighttpd has the server.max-write-idle option to specify maximum number of seconds until a waiting write call times out and closes the connection.
Apache is vulnerable in its default configuration, but MPM Event, for example, handles slow requests and responses significantly better than other modules, but falls back to worker MPM behavior for SSL connections. ModSecurity supports attributes to control how long a socket can remain in read or write state.
There is a handy script called “Flying frog” available, written by Christian Folini, an expert in Application Layer DoS attacks detection. Flying frog is a monitoring agent that hovers over the incoming traffic and the application log. It picks individual attackers, like a frog eats a mosquito.
Slow HTTP attacks are denial-of-service (DoS) attacks in which the attacker sends HTTP requests in pieces slowly, one at a time to a Web server. If an HTTP request is not complete, or if the transfer rate is very low, the server keeps its resources busy waiting for the rest of the data. When the server’s concurrent connection pool reaches its maximum, this creates a DoS. Slow HTTP attacks are easy to execute because they require only minimal resources from the attacker.
In this article, I describe several simple steps to protect against slow HTTP attacks and to make the attacks more difficult to execute.
To protect your Web server against slow HTTP attacks, I recommend the following:
Reject / drop connections with HTTP methods (verbs) not supported by the URL.
Limit the header and message body to a minimal reasonable length. Set tighter URL-specific limits as appropriate for every resource that accepts a message body.
Set an absolute connection timeout, if possible. Of course, if the timeout is too short, you risk dropping legitimate slow connections; and if it’s too long, you don’t get any protection from attacks. I recommend a timeout value based on your connection length statistics, e.g. a timeout slightly greater than median lifetime of connections should satisfy most of the legitimate clients.
The backlog of pending connections allows the server to hold connections it’s not ready to accept, and this allows it to withstand a larger slow HTTP attack, as well as gives legitimate users a chance to be served under high load. However, a large backlog also prolongs the attack, since it backlogs all connection requests regardless of whether they’re legitimate. If the server supports a backlog, I recommend making it reasonably large to so your HTTP server can handle a small attack.
Define the minimum incoming data rate, and drop connections that are slower than that rate. Care must be taken not to set the minimum too low, or you risk dropping legitimate connections.
Applying the above steps to the HTTP servers tested in the previous article indicates the following server-specific settings:
Using the <Limit> and <LimitExcept> directives to drop requests with methods not supported by the URL alone won’t help, because Apache waits for the entire request to complete before applying these directives. Therefore, use these parameters in conjunction with the LimitRequestFields, LimitRequestFieldSize, LimitRequestBody, LimitRequestLine, LimitXMLRequestBody directives as appropriate. For example, it is unlikely that your web app requires an 8190 byte header, or an unlimited body size, or 100 headers per request, as most default configurations have.
Set reasonable TimeOut and KeepAliveTimeOut directive values. The default value of 300 seconds for TimeOut is overkill for most situations.
ListenBackLog’s default value of 511 could be increased, which is helpful when the server can’t accept connections fast enough.
Increase the MaxRequestWorkers directive to allow the server to handle the maximum number of simultaneous connections.
Adjust the AcceptFilter directive, which is supported on FreeBSD and Linux, and enables operating system specific optimizations for a listening socket by protocol type. For example, the httpready Accept Filter buffers entire HTTP requests at the kernel level.
A number of Apache modules are available to minimize the threat of slow HTTP attacks. For example, mod_reqtimeout’s RequestReadTimeout directive helps to control slow connections by setting timeout and minimum data rate for receiving requests.
I also recommend switching apache2 to experimental Event MPM mode where available. This uses a dedicated thread to handle the listening sockets and all sockets that are in a Keep Alive state, which means incomplete connections use fewer resources while being polled.
Limit request attributes is through the <RequestLimits> element, specifically the maxAllowedContentLength, maxQueryString, and maxUrl attributes.
Set <headerLimits> to configure the type and size of header your web server will accept.
Tune the connectionTimeout, headerWaitTimeout, and minBytesPerSecond attributes of the <limits> and <WebLimits> elements to minimize the impact of slow HTTP attacks.
The above are the simplest and most generic countermeasures to minimize the threat. Tuning the Web server configuration is effective to an extent, although there is always a tradeoff between limiting slow HTTP attacks and dropping legitimately slow requests. This means you can never prevent attacks simply using the above techniques.
Beyond configuring the web server, it’s possible to implement other layers of protection like event-driven software load balancers, hardware load balancers to perform delayed binding, and intrusion detection/prevention systems to drop connections with suspicious patterns.
However, today, it probably makes more sense to defend against specific tools rather than slow HTTP attacks in general. Tools have weaknesses that can be identified and and exploited when tailoring your protection. For example, slowhttptest doesn’t change the user-agent string once the test has begun, and it requests the same URL in every HTTP request. If a web server receives thousands of connections from the same IP with the same user-agent requesting the same resource within short period of time, it obviously hints that something is not legitimate. These kinds of patterns can be gleaned from the log files, therefore monitoring log files to detect the attack still remains the most effective countermeasure.
Following the release of the slowhttptest tool, I ran benchmark tests of some popular Web servers. My testing shows that all of the observed Web servers (and probably others) are vulnerable to slow http attacks in their default configurations. Reports generated by the slowhttptest tool illustrate the differences in how the various Web servers handle slow http attacks.
Tests were run against the default, out-of-the-box configurations of the Web servers, which is the best level playing field for comparison. And while most deployments will customize their configuration, they will likely do it for reasons other than improving protection against slow http attacks. Therefore it’s useful to understand how the default configurations relate to slow http attack vulnerability.
In addition to noting that all Web servers tested are vulnerable to slow http attacks, I drew some other generalizations about how different Web servers handle slow http attacks:
Apache, nginx and lighttpd wait for complete headers on any URL for requests without a message body (GET, for example) before issuing a response, even for requests with the verb FAKESLOWVERB.
Apache also waits for the entire body of requests with fake verbs before issuing a response with an error message.
For Apache, nginx and lighttpd, slow requests sent with fake verbs consume resources with the same success rate as requests sent with valid verbs, so the hacker doesn’t even need to bother finding a vulnerable URL.
Server administrators’ scripts typically query for particular expected values like method, or URL, or referer header, etc., but not for fake verbs. That means it is likely that slow http attacks using fake verbs or URLs can go unnoticed by the server administrator.
Each server except IIS is vulnerable to both slow header and slow message body attacks. IIS is vulnerable only to slow message body attacks.
However, there are some interesting differences in the results as well. The screenshots below, which show the graphical output of the slowhttptest tool, demonstrate how connection state changed during the tests, and illustrate how the various Web servers handle slow http attacks.
Apache MPM prefork:
Apache is generally the most vulnerable, and denial of service can be achieved with 355 connections on the system tested. Apache documentation indicates a 300-second timeout for connections, but my testing indicates this is not enforced.
In the test, the server accepted 483 connections and started processing 355 of them. The 355 corresponds to RLIMIT_NPROC (max user processes), a machine-dependent value that is 709 on the machine tested, times MaxClients, whose default value in httpd.conf is 50%: 355 = 709 * 50%.
The rest of the connections were accepted and backlogged. The limit for backlog is set by the ListenBackLog directive and is 512 in default httpd.conf, but is often limited to a smaller number by operating system, 128 in case of Mac OS X. The 483 connected connections shown in the graph correspond to the 355 being processed plus the 128 in the backlog.
One interesting side note: The Apache documentation says that ListenBackLog is the “maximum length of the queue of pending connections”, but a simple test shows that backlogged connections are being accepted, e.g. server sends SYN-ACK back, and backlogged connections are ready for write operations on the client side, which is not a normal behavior for pending connections. Apache documentation is probably using “pending” to mean the internal connection state in Apache, but I rather expect “pending” to refer to the state in the TCP stack, where “pending” means “waiting to be accepted”.
I terminated the test after 240 seconds, but the picture would be the same for a longer test. A properly configured client can keep connections open for hours, until the limit for headers count or length is met. With such settings, DoS is achieved with N+1 connections, where N is number specified by MaxClients.
While testing, I noticed that my httpd.conf had a TimeOut directive set to 300, and Apache 2.0 documentation says that:
The TimeOut directive currently defines the amount of time Apache will wait for three things:
The total amount of time it takes to receive a GET request.
The amount of time between receipt of TCP packets on a POST or PUT request.
The amount of time between ACKs on transmissions of TCP packets in responses.
I understand from the above paragraph that the opened connections should be closed after the TimeOut interval, i.e. 300 seconds in the default configuration. However, I observed that the connection is closed only if there is no data arriving for 300 seconds, which means this is not an effective preventative measure against DoS. This behavior also indicates that if there are some rules defined to throttle down too many connections with a similar pattern in a shorter period of time, they are also ineffective: An attacker can initiate connections with very low connection rate and get the same results, as the connection can be prolonged virtually forever.
nginx is also vulnerable to slow http attacks, but it offers more controls than Apache.
Surprisingly, the number of initially accepted connections was 377, even with the default settings of worker_connections = 1024, and worker_processes = 1. But worker_rlimit_nofile, the maximum number of open file descriptors per worker that governs the maximum number of connections the worker can accept, has a default value of 377 on Mac OS X.
As shown in the graph above, the server accepts the connections it can accept, and leaves the rest of the connections pending. Due to some hardcoded timeout values, connections are closed after 70 seconds no matter how slow the data is arriving. Nginx is therefore safer than default Apache, but it still gives attacker a chance to achieve DoS for 70 seconds.
The default TCP timeout (75), which closes pending connections, is longer than the nginx timeout (65), which closes accepted connections. This means that nginx moves some pending connections to the accepted state after it times out the first set of accepted connections. This extends the length of time that a batch of slow connection requests can tie up the server.
In any case, a client can always re-establish connections every 65 seconds to keep the server under DoS conditions.
Lighttpd with default configuration is vulnerable to both http attacks, which are fairly easy to carry out.
The default configuration allows a maximum of 480 connections to be accepted, as can be seen in the graph above, with the rest pending for 200 seconds, and then closed by a timeout. Lighttpd has a useful attribute called server.max-read-idle with default value 60, which closes a connection if no data is received before the timeout interval, but sending something to the socket every 59 seconds would reset it, allowing the attacker to keep connections open for a long time.
The lighttpd forums indicate that a fixed issue protects against slow HTTP request handling, it only fixes a waste of memory issue, and hitting the limit of concurrently processing connections is still pretty easy.
IIS 7 offers good protection against slow headers, but this protection does not extend to slow message bodies. Because IIS is architected differently from the other Web servers tested, the behavior it displays is also different.
IIS 7 accepts all connections, but does not consider them writeable until headers sections are received in full. Such connections require fewer resources from IIS, and therefore IIS can maintain a relatively larger pool of these connections. In the default configuration, it is not possible to exhaust the pool with a single slowhttptest run, which is limited to 1024 connections on systems I tested on. However, it would be possible to launch multiple instances of slowhttptest to get around this limitation.
For requests with a slow message body, IIS’ protection is useless, as it’s possible to send complete headers sections but then slow down the message body section. In this case, the connections are transferred to IIS’ internal processing queue, which is limited in size to, don’t be surprised, 100 connections. Even though the screenshot shows 1000 connections, I experimentally figured out that 100 requests with slow message body are enough to get DoS.
Software configuration is all about tradeoffs, and it is normal to sacrifice one aspect for another. We see from the test results above that all default configuration files of the Web servers tested are sacrificing protection against slow HTTP DoS attacks in exchange for better handling of connections that are legitimately slow.
Because a lot of people are not aware of slow http attacks, they will tend to trust the default configuration files distributed with the Web servers. It would be great if the vendors creating distribution packages for Web servers would pay attention to handling and minimizing the impact of slow attacks, as much as the Web servers’ configuration allows it. Meanwhile, if you are running a Web server, be careful and always test your setup before relying on it for production use.
Slow HTTP attacks are denial-of-service (DoS) attacks that rely on the fact that the HTTP protocol, by design, requires a request to be completely received by the server before it is processed. If an HTTP request is not complete, or if the transfer rate is very low, the server keeps its resources busy waiting for the rest of the data. When the server’s concurrent connection pool reaches its maximum, this creates a denial of service. These attacks are problematic because they are easy to execute, i.e. they can be executed with minimal resources from the attacking machine.
Inspired by Robert “Rsnake” Hansen’s Slowloris and Tom Brennan’s OWASP slow post tools, I started developing another open-source tool, called slowhttptest, available with full documentation at https://github.com/shekyan/slowhttptest. Slowhttptest opens and maintains customizable slow connections to a target server, giving you a picture of the server’s limitations and weaknesses. It includes features of both of the above tools, plus some additional configurable parameters and nicely formatted output.
Slowhttptest is configurable to allow users to test different types of slow http scenarios. Supported features are:
slowing down either the header or the body section of the request
any HTTP verb can be used in the request
configurable Content-Length header
random size of follow-up chunks, limited by optional value
random header names and values
random message body data
configurable interval between follow-up data chunks
support for SSL
support for hosts names resolved to IPv6
verbosity levels in reporting
connection state change tracking
variable connection rate
detailed statistics available in CSV format and as a chart generated as HTML file using Google Chart Tools
How to Use
The tool works out of the box with default parameters, which are harmless and most likely will not cause a denial of service.
and the test begins with the default parameters.
Depending on which test mode you choose, the tool will send either slow headers:
GET / HTTP/1.1CRLF
Host: localhost:80 CRLF
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2)CRLF
. n seconds
. n seconds
. n seconds
or slow message bodies:
POST / HTTP/1.1CRLF
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:5.0.1) Gecko/20100101
. n seconds
. n seconds
Repeated until the server closes the connection or the test hits the specified time limit.
Depending on the verbosity level selected, the slowhttptest tool logs anything from heartbeat messages every 5 seconds to a full traffic dump. Output is available either in CSV format or in HTML for interactive use with Google Chart Tools.
Note: Care should be taken when using this tool to avoid inadvertently causing denial of service against your servers. For production servers, QualysGuard Web Application Scanner will perform passive (non-intrusive) automated tests that will indicate susceptibility to slow http attacks without the risk of causing denial of service.
Example Test and Results
The HTML screenshot below shows the results of running slowhttptest against a test server in a test lab. In this scenario, the tool opens 1000 connections with rate of 200 connections per second, and the server was able to concurrently process only 377 connections, leaving the remaining 617 connections pending. Denial of service was achieved within the first 5 seconds of the test, and lasted 60 seconds, until the server timed out some of the active connections. At this point, the server transferred another set of connections from pending state to active state, thus causing DoS again, until the server timed out those connections.
Figure 1: Sample HTML output of slowhttptest results.
As is shown in the above test, the slowhttptest tool can be used to test a variety of different slow http attacks and to understand the effects they will have on specific server configurations. By having a visual representation of the server’s state, it is easy to understand how the server reacts to slow HTTP requests. It is then possible to adjust server configurations as appropriate. In follow-up posts, I will describe some detailed analysis of different HTTP servers’ behavior on slow attacks and mitigation techniques.
Any comments are highly appreciated, and I will review all feature requests posted on the project page at https://github.com/shekyan/slowhttptest. Many thanks to those who are contributing to this project.
Interest in the QualysGuard Web Application Scanning (WAS) module has been growing since its new UI was demonstrated last week at BlackHat. Along with such interest come questions about how the scanner works. The ultimate goal for WAS is to provide accurate, scalable testing for the most common, highest profile vulnerabilities (think of SQL injection and XSS) so that manual testing can skip the tedious and time-consuming aspects of an app review and focus on complex vulns that require brains rather than RAM.
One complex vuln in particular is CSRF. Automated, universal CSRF detection is a tough challenge, which is why we try to solve the problem in pieces rather than all at once. It’s the type of challenge that keeps web scanning interesting. Here’s a brief look at the approach we’ve taken to start bringing CSRF detection into the realm of automation.
First, the test assumes an authenticated scan. If the scan is not given credentials, then the tests won’t be performed. Also, tests are targeted to specific manifestations of CSRFrather than the broad set of attacks possible from our friendly sleeping giant.
Tests roughly follow these steps. Fundamentally, we’re trying to model an attack rather than make inferences based on pattern matching:
1. Identify forms with a "session context". This is a weaker version of (but hopefully a subset of) a "security context", because lots of times security requires knowledge about the boundaries within an app and the authorized actions of a user. This knowledge is hard to come by automatically. Never the less, some utility can be had by looking at forms with the following attributes:
Only available to an authenticated user.
Are not "trivial" such as search forms or logout buttons.
Have an observable effect, either on the session or the HTTP response. (Hint: Here’s where the automated scan becomes narrow, meaning prone to false negatives.)
2. Set up two separate sessions for the user (i.e. login twice). Keep their cookie jars apart. We’ll refer to the sessions as Aardvark and Bobcat (or A & B or Alpha & Bravo, etc.). Remember, this is for a single user.
3. Obtain a form for session Aardvark.
4. Obtain a form for session Bobcat.
5. Swap the forms between the two sessions and submit. (Crossing the streams, like Egon told you not to do.)
The assumption is that any CSRF tokens in Aardvark’s form are tied to the session cookie(s) used by Aardvark and Bobcat’s belong to Bobcat. Things should blow up if the tokens and session don’t match.
6. Examine the "swapped" responses.
If the response has a clear indication of error, then the app is more likely to be protected from CSRF. The obvious error is something like, "Invalid CSRF token". Sadly, the world is not unicorns and rainbows for automated scanning and errors may not be so obvious or point so directly to CSRF.
If the response is similar to the one received from the original request, then it appears that the form is not coupled to a user’s session. This is an indicator that the form is more probably vulnerable to CSRF.
What it won’t do, because these techniques are noisy and unreliable (as opposed to subtle and quick to anger):
Look for hidden form fields with names or values that match CSRF tokens. If an obvious token is present, that doesn’t mean the app is actually validating it.
Use static inspection of the form, DOM, or HTML to look for any examples of CSRF tokens. Why look for text patterns when you’re trying to determine a behavior? Not everything is solved by regexes. (Which really is unfortunate, by the way.)
Attempt to evaluate the predictability of anything that looks like a CSRF token.
Nor will it demonstrate the compounding factor of CSRF onother vulnerabilities like XSS. That’s something that manual pen-testing should do. In other words, WAS is focused on identifying vulns (it should find an XSS vuln, but it won’t tie the vuln to a CSRF attack to demonstrate a threat). Manual pen-testing more often focuses on how deep an app can be compromised — and the real risks associated with it.
What it’ll miss:
Situations where sessions cookie(s) are static or relatively static for a user. This impairs the "swap" test.
CSRF that can affect unauthenticated users in a meaningful way. This is vague, but as you read more about CSRF you’ll find that some people consider any forgeable action should be considered a vuln. This speaks more to the issue of evaluating risk. You should be relying on people to analyze risk, not tools.
CSRF that affects the user’s privacy. This requires knowledge of the app’s policy and the impact of the attack.
Forms whose effect on a user’s security context manifests in a different response, or in a manner that isn’t immediately evident.
CSRF tokens in the header, which might lead to false positives.
CSRF vulns that manifest via links rather thanforms. Apps put all kinds of functionality in hrefs rather than explicit form tags.
Other situations where we play games of anecdotes and what-ifs.
What we are trying to do:
Reduce noise. Don’t report vulns for the sake of reporting a vuln if no clear security context or actionable data can be provided.
Provide a discussion point so we can explain thebenefits of automated web scanning and point out where manual follow-up will always be necessary.
Learn how real-world web sites implement CSRF in order to find common behaviors that might be detectable via automation. You’d be surprised (maybe) at how often apps have security countermeasures that look nothing like OWASP recommendations and, consequently, fare rather poorly.
Experiment with pushing the bounds of what automation can do, while avoiding hyperbolic claims that automation solves everything.
The current state of CSRF testing in WAS should be relied on as a positive indicator (vuln found, vuln exists) more so than a negative indicator (no vuln found, no vulns exist). That’s supposed to mean that a CSRF vuln reported by WAS should not be a false positive and should be something that the app’s devs need to fix. It also means that if WAS doesn’t find a vuln then the app may still have CSRF vulns. For this particular test a clean report doesn’t mean a clean app; there’re simply too many ways of looking at the CSRF problem to tackle it all at once. We’re trying to break the problem down into manageable parts in order to understand what approaches work. We want to hear your thoughts and feedback on this.
Slow HTTP attacks rely on the fact that the HTTP protocol, by design, requires requests to be completely received by the server before they are processed. If an http request is not complete, or if the transfer rate is very low, the server keeps its resources busy waiting for the rest of the data. If the server keeps too many resources busy, this creates a denial of service.
These types of attack are easy to execute because a single machine is able to establish thousands of connections to a server and generate thousands of unfinished HTTP requests in a very short period of time using minimal bandwidth.
Due to implementation differences among various HTTP servers, two main attack vectors exist:
Slowloris: Slowing down HTTP headers, making the server wait for the final CRLF, which indicates the end of the headers section;
Slow POST: Slowing down the HTTP message body, making the server wait until all content arrives according to the Content-Length header; or until the final CRLF arrives, if HTTP 1.1 is being used and no Content-Length was declared.
The scary part is that these attacks can just look like requests that are taking a long time, so it’s hard to detect and prevent them by using traditional anti-DoS tools. Recent rumors indicate these attacks are happening right now: CIA.gov attacked using slowloris.
QualysGuard Web Application Scanner (WAS) uses a number of approaches to detect vulnerability to these attacks.
To detect a slow headers (a.k.a. Slowloris) attack vulnerability (Qualys ID 150079), WAS opens two connections to the server and requests the base URL provided in the scan configuration.
The request sent to the first connection consists of a request line and one single header line but without the final CRLF, similar to the following:
GET / HTTP/1.1 CRLF
Connection: keep-alive CRLF
The request sent to the second connection looks identical to the first one, but WAS sends a follow-up header line some interval later to make the HTTP server think the peer is still alive:
Currently that interval is approximately 10 seconds plus the average response time during the crawl phase.
WAS considers the server platform vulnerable to a slowloris attack if the server closes the second connection more than 10 seconds later than the first one. In that case, the server prolonged its internal timeout value because it perceived the connection to be slow. Using a similar approach, an attacker could occupy a resource (thread or socket) on that server for virtually forever by sending a byte per T – 1 (or any random value less than T), where T is the timeout after which the server would drop the connection.
WAS does not report the server to be vulnerable if it keeps both connections open for the same long period of time (more than 2 minutes, for example), as that would be a false positive if the target server were IIS (which has protection against slow header attacks, but is less tolerant of real slow connections).
Slow POST Detection
To detect a slow POST (a.k.a. Are-You-Dead-Yet) attack vulnerability (QID 150085), WAS opens two other connections, and uses an action URL of a form it discovered during the crawl phase that doesn’t require authentication.
The request sent to the first connection looks like the following:
POST /url_that_accepts_post HTTP/1.1 CRLF
Host: host_to_test:port_if_not_default CRLF
User-Agent: Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0;) CRLF
Similar to the slow headers approach, WAS sends an identical request to the second connection, and then 10 seconds later sends the following (again without the final CRLF):
WAS considers the target vulnerable if any of the following conditions are met:
The server keeps the second connection open 10 seconds longer than the first one, or
The server keeps both connections open for more than 120 seconds, or
The server doesn’t close both connections within a 5 minute period (as WAS limits slow tests to 5 minutes only).
WAS assumes that if it is possible to either keep the connection open with an unfinished request for longer than 120 seconds or, even better, prolong the unfinished connection by sending a byte per T – 1 (or any random value less than T), then it’s possible to acquire all server sockets or threads within that interval.
WAS also performs a supplemental test to determine unhealthy behavior in handling POST requests, by sending a simple POST request to the base URI with a relatively large message body (65543 Kbytes). The content of the body is a random set of ASCII characters, and the content type is set to application/x-www-form-urlencoded. WAS assumes that if the server blindly accepts that request, e.g. responds with 200, then it gives an attacker more opportunity to prolong the slow connection by sending one byte per T – 1. Multiplying 65543 by the T – 1 would give you the length of time an attacker could keep that connection open. QID 150086 is reported on detection of that behavior.
Tests performed by WAS are passive and as non-intrusive as possible, which minimizes the risk of taking down the server. But because of the possibility of false positives, care should be taken, especially if the HTTP server or IPS (Intrusion Prevention System) is configured to change data processing behavior if a certain number of suspicious requests are detected. If you are interested in active testing, which might take your server down, you can try some active testing using one of these available tools:
Mitigation of slow HTTP attacks is platform specific, so it’d be nice for the community to share mitigation techniques in the comments below. I’ll post an update with information on some of those platforms, as well as general recommendations that can be extrapolated to particular platforms.