Protect Your Applications from Hacker Research

Xiaoran Dong

Last updated on: January 27, 2021

The prevalence of accidents, like that of vulnerabilities, tells us there is no perfect thing. And even if any given vulnerability is unexpected, we know from experience that the existence of vulnerabilities is inevitable. Hackers know this too, of course, and a determined hacker will use whatever tools are available to him to find vulnerabilities to exploit. One of the most obvious tools for a hacker is research, and simply inspecting the data your application publishes about itself can yield helpful information to a hacker. But how much data your application makes available to hacker research is within your control. It is feasible to mitigate the risk of hacker research by implementing policy compliance best practices. As a Policy Compliance signature developer, I will take Apache HTTP Server as an example to illustrate how applications can leak data that is helpful to hackers, and how you can prevent it.

Data Collection for Hackers


As you may know, Google dorks and SHODAN are powerful tools for penetration testing and data collection. Google dorks, also known as “Google hacking”, involve using specialized search parameters to locate very specific information. With Google dorks including “inurl”, “intext”, “intitle”, “site”, one can search within a specific website and locate sensitive information by searching keywords in the url, web page title and content. SHODAN interrogates ports and grabs the resulting banners, then indexes the banners (rather than the web content) for searching. With SHODAN, one can locate sensitive information by searching keywords in banners. While useful to web developers and security professionals, these tools also lower the barriers of data collection and increase data collection efficiency for hackers.

Inspecting Apache

Apache HTTP Server logo

Apache HTTP server is one of the most popular web servers and serves a fundamental role in running web applications by processing interactive operations to present web pages to website visitors. Its many built-in configurations help users monitor the server and make use of its powerful feature set quickly. Some of these features also enable attackers to collect data through Google dorks and SHODAN, since the data can easily be indexed by search engines.

Five Examples

I explore five examples here to explain:

Example 1: Mod_info

The “mod_info” module in Apache HTTP Server provides information about the server configuration via access to the /server-info URL. Look at the below snapshot:


To the server administrator, it is very helpful to have an overview of your web server status including apache version, the location of configuration files, detailed configurations and users/groups running apache. However, that page may also be accessed by the public, so search engines like Google may index this public page. Some common things in this page such as “/server-info” could be a common path and “Apache Server Information” could be a common page content, so, with Google dorks, you could use the query inurl:”/server-info” intext:”Apache Server Information” to see these results, and it’s possible that your page will be included in that search result. A potential attacker would be interested in the Apache Server version number, which would allow him to target attacks based on known vulnerable versions. For example, Apache 2.4.10 has these known vulnerabilities that a hacker could exploit: CVE-2014-3583, CVE-2014-3581 and CVE-2013-5704.

Example 2: Autoindex

The Apache autoindex module automatically generates a web page listing the contents of the directories on the server, which is typically done so an index.html does not have to be generated.

Look at the below page:


There is no “index.html” in that directory so it just lists all files under that directory. You could see many websites using this feature to list their downloadable files for downloading, but if you are not prudent enough, your sensitive information could also be exposed to public. Once the page is indexed, you could use the query “site:<target website> intitle:”Index of” intext:”Index of” intext:”Parent Directory”” to locate the page in the specified website.

Because all files in this directory can be automatically indexed by search engines, and if a system admin included some sensitive files in here, for example password.xls might include some password data, or application.conf might include some configuration data, then any one can access them easily. System admins should be careful what files they put in this directory, or better yet, they should disable this feature or set proper access control.

Example 3: Mod_status

The Apache mod_status module provides current server performance statistics. Look at the below snapshot:


This is useful to monitor server performance, but if it is indexed, then a simple query ‘inurl:”/server-status” intitle:”Apache Status”’ could give opportunities to attackers to obtain version information for your application. Comparing with example 1, the difference is that attackers here can get dynamic data about the server, and they could launch an attack (e.g. DoS) and see its effect on the server. They could also use the version data to do the same thing in Example 1.

Example 4: Default CGI Content

Most web servers, including Apache installations have default CGI content which is not needed or appropriate for production use. The primary function for these sample programs is to demonstrate the capabilities of the web server. A common default CGI content for Apache installations is the script printenv. This script will print back to the requester all of the CGI environment variables which includes many server configuration details and system paths.

Look at the below snapshot:


As you could see, the path to access to this script could always be “/cgi-bin/printenv”, so you could use the query “inurl:”/cgi-bin/printenv”” in Google to see who are exposing these information to public.

In this case, the main exposure is to tell hackers that CGI is enabled, so webmasters should disable CGI. Using CGI is risky in and of itself because CGI scripts can run essentially arbitrary commands on your system with the permissions of the web server user and can therefore be extremely dangerous if they are not carefully checked. In addition, “Shellshock”, the well known destructive vulnerability of 2014, can be used by attackers to hack those web servers with CGI enabled.

Example 5: ServerTokens Full

Next, let’s take a look at information disclosure in server banner. In Apache 2.4, there is a default setting “ServerTokens Full” which means Server response header field which is sent back to clients includes a description of the generic OS-type of the server as well as information about compiled-in modules. So you may see a “Server” field in an HTTP response likes this:

HTTP/1.0 301 Moved Permanently
Date: Fri, 27 Sep 2013 00:55:04 GMT
Server: Apache/2.2.9 (Unix) mod_ssl/2.2.9 OpenSSL/0.9.8o mod_wsgi/2.4 Python/2.6.2
Content-Length: 399
Connection: close
Content-Type: text/html; charset=iso-8859-1

If the administrator is not prudent enough, that banner with the server IP could be indexed by SHODAN and will be seen by anyone. This is another way that hackers can find out version numbers, which allows them to exploit known vulnerabilities, e.g. in this case Apache 2.2.9, mod_ssl 2.2.9, OpenSSL 0.9.80 and mod_wsgi 2.4 could all be exploited by attackers on your target.


To prevent the disclosure of configuration information by your Apache instance, the best general-purpose strategy is to implement the CIS Benchmark for establishing a secure configuration posture for Apache HTTP Server. In CIS Apache HTTP Server 2.4 Benchmark, more than 15% of the recommendations relate to prohibiting unnecessary but sensitive information disclosure, and provide a good baseline for most environments. Of course different organizations may have different configurations for their Apache HTTP Servers, so admins should perform a complete evaluation of their environments, including setting up access control, for their environments to determine if additional controls are required.

Of course, preventing information disclosure in Apache HTTP Server is just one measure to make data collection or vulnerability detection tougher. Administrators should follow the other 85% of recommendations in the CIS benchmark and enforce other appopriate policies for their environments.

Automating Protection Policies

Qualys currently has a pre-defined policy for CIS Apache HTTP Server 2.2 Benchmark (i.e. for Apache HTTP Server v2.2), and is developing a policy for CIS Apache HTTP Server 2.4 Benchmark version 1.2.0 that will be released in 2015. I’ll update this blog post with details on how to use it when it becomes available.

In addition, the Qualys Policy Compliance library of built-in policies makes it easy to comply with many other commonly-adhered to security standards and regulations. Qualys Policy Compliance has extensive coverage for webservers with over 400 controls and numerous CIS Certified Benchmarks for webservers covering platforms such as Apache, Microsoft IIS, and IBM HTTP Server, including:

  • CIS Benchmark for IIS 8.x, v1.1.0
  • CIS Benchmark for Apache HTTP Server 2.2, v3.2.0
  • CIS Benchmark for IIS 7.x, v1.3.0
  • Mandate based policies for PCI-DSS 3.0 and MAS IBTRM

This set of libraries helps simplify the complex task of keeping large numbers of disparate systems in compliance with security policies over time.

Share your Comments


Your email address will not be published. Required fields are marked *