When you’re looking to add automation to your vulnerability management and policy compliance program, a good starting point is the Host List, which is your scanned asset inventory. More precisely, it represents which assets have been scanned (and when) for Qualys Vulnerability Management (VM) or Policy Compliance (PC). After extracting this data, you can do further analysis of your scan coverage or schedule future scans on all or any subset of your assets.
Following the initial blog post about the KnowledgeBase API, this post shows how to extract and manipulate Host List data, including into Python Shelve Dictionary, JSON, CSV and SQLite database formats. The accompanying video presents these API best practices along with live code examples, so that you can effectively integrate the Host List with other data and automations.
Then, in part 3, we’ll cover the Host List Detection API, which takes a Host List as input and returns vulnerability data for that set of hosts. These parts taken together form the basis for automation of your vulnerability program. In a future part, we’ll cover the Compliance API, which performs similar functions for compliance.
At the end of this Qualys Host List API blog post and video, you will gain experience in the areas of development, design, and performance with the Qualys API including:
- Applying a simple ETL design pattern to the Host List API
- Live code examples demonstrating ETL of Host List API
- Transformation of Host List XML into Python Shelve Dictionary, JSON, CSV and SQLite database formats for further processing and integration outside the Qualys Cloud Platform.
Definition of Host List
The Host List is your subscription’s inventory of assets scanned via VM and PC, including on-premises, endpoint and cloud assets. The primary unique key of each host is ID, also referred to as HOSTID. The host list contains up-to-date information on when a host was last scanned for both Qualys Vulnerability Management and Policy Compliance.
- Each time a scan completes for Qualys Vulnerability Management (VM) or Policy Compliance (PC), the dates in the Host List are updated.
- Other information is updated as well, including Amazon, Azure or Google Cloud metadata.
The Qualys API provides customers a way to consume your Qualys data, pulling it from the Qualys Cloud Platform to your site for consumption. The Host List API is useful as a programmatic driver in automation as it helps determine the scope of which assets have been scanned for either Vulnerability Management (VM) or Policy Compliance (PC).
Customers can consume Qualys data for automated processing of:
- Vulnerability Analysis
- Operational Response
With any API, there are inherent automation challenges. Some of those automation challenges are:
- Identifying best practices programming methods to extract a list of hosts scanned from the Qualys Host List reliably and efficiently, including VM, PC, or Cloud Agent data.
- How to obtain some or all of Host List XML output which provides a rich asset information source for each host scanned.
- How to extract Qualys data and integrate it into a customer database for reuse in automation.
ETL Design Pattern
The ETL (Extract, Transform and Load) design pattern is a wonderful place to start when transforming Qualys API data into a form/format that is appropriate for your organization. In the video, we will walk through the ETL Activity Diagram and demonstrate live code examples to ETL your data into a local Python Shelve Dictionary, JSON, CSV and SQLite database.
Recommendation – Follow a basic ETL design pattern to prepare Host List data for application usage.
- Extract: Full or Incremental
- Transform: Prepare Data
- Load: Persistent Data Store
- Repeat: Update as Needed
Host List Activity Diagram
The key point of the Host List Activity Diagram is to show there are two types of extracts. The first is a full extract. That will be your first run or in some instances, you may want to ensure you have all the Host List data in your database after an outage or error. The second is the incremental update. That will be your primary way of pulling Qualys Host List Data. With the incremental update, you speed up processing by eliminating duplicate data from being both downloaded and further evaluated for storage. This streamlines your processing and ensures the Host List updates are ready for consumption by your applications as quickly as possible.
- Store Host List data local to your applications to improve performance.
- Improve performance and reduce duplicate data by performing full updates no more than once per month.
- Perform incremental updates in line with the Host List Detection download.
Code Example 1: Full Extract
This example code shows how to stream the full Host List to a file. The size of the full Host List varies depending on the options you pass to it, along with the number of hosts in your environment. In the example code we demonstrate how to extract all the Host List data including GCP, Microsoft Azure and Amazon EC2 data. To perform this type of extract you have to do a bit of prep work to set up tags. See line 10 for an example of how you can set up tags to identify the different types of provider you have in your subscription. This code example calls the host list API on line 41 and streams each providers data to an XML file prefixed with the provider’s name. See this code run in the accompanying video.
Three Key Elements of the Qualys Host List
This is a snippet of Host List XML highlighting three key elements. Note that the ID is the unique key for the Host List and it is unique to your subscription. The LAST_VULN_SCAN_DATETIME and LAST_COMPLIANCE_SCAN_DATETIME are updated each time the host is scanned for Vulnerability or Policy Compliance, respectively.
- ID – the unique key for the Host List, also referred to as HostID. It is unique to your subscription.
- LAST_VULN_SCAN_DATETIME – UTC date/time for last vulnerability scan.
- LAST_COMPLIANCE_SCAN_DATETIME – UTC date/time for last Policy Compliance Scan.
Code Example 2: Incremental Extract
Code Example 2 is the exact same code we ran earlier. The key difference is the date of vm_processed_after date. You can set this to the date you are interested in. For example, you may only want to extract data for systems that were scanned for vulnerabilities in the last hour.
Code Example 3: Transform / Load
Once we have the XML downloaded, you can easily transform the XML into a Python Shelve Dictionary. This is a persistent Python dictionary. In this case the key to the dictionary is the ID ( HostID ) which is unique. The object stored in the dictionary is the fully nested XML data structure that has been converted to a Python dictionary.
Code Example 4: Transform Python Shelve to JSON
In this example, we take the Python Shelve Dictionary that was created in the previous example and transform it to JSON. Line 5 is the Input Shelve previously created in Code Examples 1 or 2, and line 6 is the Output JSON File. On line 12, we iterate through the dictionary using a for loop transforming each Host List entry into JSON using json.dump.
Code Example 5: Transform Python Shelve to CSV
In this example, we take the Python Shelve Dictionary that was created in the previous example and transform it to output a CSV File. Lines 6 and 7 reference the output CSV to be created along with the Input Shelve of XML from the Host List. The columns we are going to focus on transforming are placed in a function called host_list_csv_columns for reuse. On lines 13-16 we initialize the new CSV File with the CSV Headers. On lines 18-43 we iterate through the Shelve Dictionary transforming it to CSV. Note on line 25-28 we alter DATE and ASSET_GROUP_IDS elements for ease of processing in downstream systems. Further we flatten the complex nested data on line 30 using the flatten_nest function, organizing it under its corresponding top level element from the XML. Line 42 then writes the constructed csv_row.
Additional Functions for Code Example 5
Code Example 6: Transform Python Shelve to SQLite Database
In code example 6, we take the previously constructed CSV file from Code example 5 and load it into an SQLite database for reuse. Lines 6-8 identify the input CSV file, output SQLite database and the table name that will be dropped and re-created. Lines 9-17 define the create table statement, 18-25 defines the create table execution and lines 25-45 read the csv file, populating the SQLite database in bulk inserts. On line 31, the bulk_insert_count = 2000 which represents 2000 rows held in the memory cursor_obj for write to DB. You can change bulk_insert_count to optimize your bulk load depending on your compute/memory/disk resources available.
Additional Function for Code Example 6
Put it into Practice
Going forward, here are some final key tips:
- Use ETL design pattern to consume Host List API data
- Optimize your Host List API calls by using incremental updates.
- For questions, schedule time through your TAM to meet with our solutions architects, we are here to help.
- To see these code examples demonstrated live, please see the accompanying video.
- Video: API Best Practices Part 2: Host List API
- Host List API Guide within VM/PC Guide
- Qualys API Training (Including Postman)
- Understanding Entity IDs in VM
About This Series
The API Best Practices Series is designed for stakeholders or programmers with general knowledge of programming who want to implement best practices to improve development, design, and performance of their programs that use the Qualys API.
In addition to Part 1 KnowledgeBase and Part 2 Host List, the series will expand over the coming months to cover other key aspects of the Qualys API, with each presentation building on the previous one and in aggregate providing an overall best practice view of the Qualys API. The next presentations in the series will focus on Host List Detection APIs.