About two months ago, Qualys SSL Labs published the results of an Internet-wide SSL survey. We said that we would make the raw data available, and today we are following up on that promise. (By the way, we realize that two months is a long time, but we couldn’t complete the process faster on this occasion. We hope to make future releases pretty much as soon as we obtain the data. As you may remember, our plan it to make our survey a quarterly event from 2011.)
The raw data contains the SSL assessment results of about 850,000 domain names (out of about 120M we inspected). The main file (1.2 GB 120 MB compressed, 3.5 GB 800 MB uncompressed) is a dump of our PostgreSQL database in CSV format. We include in the download a simple PHP script that iterates through all the rows, which means that you can consume the data directly. Alternatively, you can put the data back into the database and use SQL to run ad-hoc queries (we provide the schema along with the import instructions).
The database schema contains 63 fields that generally parallel the information you would obtain from the SSL Labs online test. The complete original certificate chain is included, which is handy if you want to look into the aspects we didn’t. We chose not to release certain sensitive data: the information on the low entropy private keys, renegotiation support, and HTTP server signatures was removed.
This is what you need to do to obtain the data:
- First, make sure that our terms and conditions are acceptable to you. At the core, we use the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported licence, but there are a few additional requirements. For example, we ask the obvious — that you don’t use the data for illegal activities. The other requirements are just common sense. (Please do read the entire file, however.)
- Second, send us an email (username "ivanr"; domain name "webkreator.com"), introduce yourself, and tell us how you intend to use the data. We will then send you back the download instructions. We need this second step to give us an idea if the data is used, and how.
Update: We are removing the certificate chain data from the database until we confirm that we are legally allowed to redistribute it. If you need such data in the meantime, retrieve it directly from the servers.
Podcast: Ivan talks about the Qualys SSL Labs Internet-wide SSL survey and the recent release of the raw data from the survey.