Part II: Implementing Effective Cyber Security Metrics that Reduce Risk Realistically
Last updated on: July 30, 2023
In Part I of this three-part blog series, we discussed building a cyber risk metrics program from the ground up. We also discovered how to implement effective strategies for holistically articulating your cyber risk posture across your organization.
In our second installment, we’ll delve deeper into how to elevate your cyber risk profile to drive support of your organization’s attack surface reduction strategy, and we’ll review some real-world use cases to help you achieve this mission effectively.
Strategically positioning security metrics to business leaders is a pathway to raise awareness and drive action to reduce your organization’s attack surface.
Let’s take these two use cases to understand how this works.
Use Case #1 Revenue Generating Ecosystem (RGE)
RGE includes computing systems critical to the organization for fulfilling its mission. A business mission could be producing transportation parts or delivering lifesaving services. In a typical business enterprise, RGE is usually seen as providing a product or service that makes revenue for the business and/or adds value to the board of directors or investors.
On the other hand, a business mission supporting blood donation doesn’t generate revenue. Still, it provides life-saving services, and the computing systems that support this mechanism are considered “RGE” – even though no direct monetary returns are generated for that business.
As such, metrics demonstrating the risk profile of these mission-critical systems are needed based on the previous two examples. Another essential reason to measure the technical and cyber hygiene of RGE is that these systems utilize the lion’s share of IT support cycles and require attention by security teams too. Essentially, both technical and business leaders are highly aware of the health of these mission-critical systems, and this is another reason metrics centered around this RGE concept are warranted.
To measure your RGE environment, you must know all the assets comprising this landscape. (Qualys has straightforward, effective technical solutions to meet this foundational requirement.)
To understand more about these solutions, please see the following links: Network Scanners, Qualys Cloud Agent, Passive Sensors, CSAM, and EASM. To gain additional details on how Qualys can assist your company in completing its journey on complete 100% asset visibility, you should schedule a follow-up meeting with our experts.
Below are non-technical sample steps for discovering all your RGE assets:
- Review these assets’ internal BCP/DR (Business Continuity Program/Disaster Recovery) Application ratings.
- Speak with Application Architecture teams to determine if you have all the assets and their correct importance to the business using the BCP/DR ratings as a starting point.
- Speak with Infrastructure Architecture teams to determine if your initial list from the BCP/DR document is complete. The ratings are current and don’t need updating.
- Speak with the Production support Tier 1 team and field initial calls on these RGE systems. This provides the most recent insights if your RGE list is accurate regarding the number and importance viewed in the business’ eyes.
- Speak with other support teams that interact with these RGE systems. This action will complete the task into the accuracy of your RGE list in terms of the correct amount and importance viewed in the eyes of the business.
In short, points 1-3 provide a solid starting point for assisting in discovering which assets make up your RGE and their associated business criticality. As such, this approach will get you 80% there at this stage of the process.
Points 4 and 5 serve to fill in the gaps and bring your RGE picture to completeness. Hence, this step completes the other 20% of this picture. Again, I’m talking about the non-technical approach to gain a complete view of all assets that comprise your Revenue Generating Ecosystem. You will need technical systems similar to the mechanisms I noted above.
To reduce the enterprise attack surface, here are three phases of metrics that are derived from this exercise:
- Phase 1: Cyber risk metrics focused on the primary systems/assets that generate revenue for the organization.
- Phase 2: Metrics focused on the secondary/ancillary assets that support the revenue generation computing ecosystem for the company.
- Phase 3: Cyber risk metrics focused on the technical debt assets comprised of the computing systems within the RGE phases 1 & 2. (See next section on Technical Debt Reduction.)
Use Case #2 Technical Debt Reduction
Technical debt is a universal problem within an organization’s IT infrastructure. It is the summation of leftover older systems, applications, or code that never get replaced or upgraded but are still useful to the business and its operations. It typically occurs after upgrades are completed, transitioning to new systems or applications or actions resulting from a merger/acquisition or other significant shifts in IT or business operations.
The usual excuse is: ‘We’ll decommission those systems during the next change window.’ As a result, the following change window turns to weeks, weeks turn to months, and months become years – or never.
Most organizations never eliminate all their technical debt. This requires spending more on CAPEX or OPEX to keep these antiquated systems running. It includes extending support contracts and maintenance or hiring consultants to manage deprecated systems or applications – especially when an organization loses internal technical expertise on these older systems.
We must primarily consider principal (one-time cost of removing debt, re-architecture, and migration costs) to determine ‘interest’ costs (recurring increased maintenance).
Technical shortcomings from old technology add risks to your attack surface due to the lack of security patches or upgrades no longer being produced by the applicable vendors. This scenario is akin to compounding interest for a savings account. Example: Compound interest generates “interest on interest” and makes the sum of money grow faster than simple interest. Unfortunately, technical debt does not benefit businesses like interest does in this example.
State the costs of maintaining and supporting this unruly computing system when framing a business case to reduce and remove your company’s technical debt. Include previous years’ costing numbers and projected costs for the next “x” years. The resulting metric will support the IT team’s goal of removing older systems and show which technical risks will be mitigated.
Consider enriching these business numbers with values from analysts such as the Ponemon Institute. For example, a recent Ponemon Institute report noted the average cost of a data breach in 2022 increased 12.7% from $3.86M (in 2021) to $4.35M. The per-record cost of a data breach hit a seven-year high of $164.00, a 1.9% increase from 2021.
By citing these third-party statistics from the Ponemon Institute, it emphasizes the value of reducing technical debt, and grounds, your projected cost savings linked to relatable real-life recent examples.
As a result of this approach, a reduction in your cyber risk posture is achieved by creating and using these tailored metrics for Executives and other IT leaders. These actions also serve as a catalyst to decrease your technical debt catalog with the value gained of a reduction in attack surface.
Implementing the Risk Metrics
To close out the first blog in this series, I recommended starting here:
Metrics for Leadership (ATL)
Overall Risk Score: As part of that baseline approach, I noted earlier, using the initial Cyber Risk Score to establish a baseline for your security program is a great place to start. Typically, to get to this initial score, your security solution toolsets will have native scoring and/or dashboard features.
Modern tooling needs to have the flexibility to ingest disparate data sets (See Qualys Enterprise TruRisk Management ETM), and your security team needs the ability to make manual adjustments based on known exceptions and risk mitigation strategies that affect this risk score.
For example, you know things about your technical environment that your tooling will not. When a specific asset is behind three sets of Next Gen Firewalls with advanced logging enabled, the risk of exploiting this asset is significantly reduced.
Security analysts need the ability to adjust accordingly to accurately reflect the True Risk within your enterprise of that asset. (See Qualys TruRisk to view how to accomplish this step.) Otherwise, your scoring system will be skewed, and your teams will chase a ghost. The metric numbers will not reflect your true risk posture, and your metric program will be a “tick a box” deliverable. This doesn’t add value to your organization and will steal precious work cycles from your team while wasting time and money for your business.
My advice: Only go one level down to break out these metrics for leadership using a summarized risk score of the perimeter and a summarized risk score of the inside computing environment, as I noted above.
Pro-Tip: Always define and clearly note on your metric dashboard what the concept “perimeter” and “inside” means to the audience. This will pre-answer their questions of “What do you mean by perimeter? ” or ” What do you mean by the “inside environment”?
Based on my 25+ years of experience working in different types of companies, these types of questions seeking clarity from their perspective are always asked. Always.
By pre-answering these questions and clearly defining two computing environments, again solidifies your commitment to seeing this metric program succeed for them and your organization. This action also signals to your leadership that you think beyond the initial ask and paints you as a savvy business operator. Being proactive in this soft skill is an attribute you will be graded on as part of your annual performance evaluation.
Metrics for Operations (BTL)
BTL metrics comprise Internal and Perimeter environments (Perimeter includes DMZ, edge/internet-facing areas, and Cloud providers.)
Internal (Suggestions to get started)
- The number of all user assets (laptops/workstations) that have security agents installed AND working correctly.
- The number of all infrastructure components (servers, VMs, etc.) that have security agents installed AND working correctly.
Note: Working is measured by the last time it checked into the security console or Active Directory, etc., successfully – the previous 24 hours is an excellent place to start for your “agent last check-in threshold.”
This is the approach I took when establishing this baseline for the organizations I worked at previously. Having the binary installed and not working does not protect your organization’s data, reputation, or stock price. This is an important reason to measure this piece of IT for your security posture.
- Patch penetration metrics that demonstrate patch deployment AND, if installed successfully, for user assets and non-user assets (i.e., infrastructure components). This is analog to the notion mentioned above that it doesn’t do any good for the organization to deploy patches if the code/update is not installed successfully. This is an important nuance to note in your BTL metric approach that gets overlooked since it is assumed that when an asset is patched, it was patched successfully. You have to validate that notion.
Pro-Tip: Also, these are the same questions IT auditors will ask you to evidence during an IT audit in terms of security agent coverage AND are the agents working as designed and expected and ‘Show me how you demonstrate the success of your patching process and how to do you account for deviations or exceptions in this process?’ Having this information documented and trending demonstrates to the auditors that your cyber defense program is maturing in the right direction.
Perimeter (recommendations to start with)
- After establishing a baseline of what “normal” looks like for your organization, present the monthly average of blocked commoditized attacks thwarted by your NGFW from the external side of the Firewall. This trend assists in telling the story to your leadership and operation teams’ audiences about the intensity level and consistency of the “attack grit” the adversary is exerting against your organization.
Pro-Tip: Determine which modules on your NGFW are being used – L7 Firewall rules, IDS/IPS, WAF, etc. This perspective will provide the details of what layer of the OSI model these attacks are traversing to aid in bolstering your cyber defenses.
- CNAPP Risk Score – Cloud Native Application Platform Protection provides a cyber risk posture score. After baselining your cloud environments, I found the score generated from CNAPP is a solid indicator of your cloud risk posture. As a result, you can use this metric to contribute to your perimeter risk strength which ultimately rolls up to your risk status being told to your leadership.
Note: Qualys offers CNAPP as part of a cloud-native security solution – called TotalCloud.
- Measure MTTR – Mean Time to Remediate after baselining has been completed. Do not forget to execute this action within the sweet spot of your scan frequency related to your patch/misconfiguration remediation efforts.
- Exceptions – after the baselining process is completed, Ask this question – “Have anomalies for patching & misconfigurations been identified and documented – not only for timely follow-ups but to produce as evidence for IT auditors in case you are asked as part of standard control testing? – What is the trend? Is this a tolerable level for the risk appetite of your organization? Or does the risk tolerance level for your organization need to be baselined?
Pro-Tip: This is an excellent time to re-evaluate your risk tolerance for your organization after launching and continuously using your metrics program.
Concluding Thoughts
As you develop a security metrics program, focus on the crucial outcomes. In particular, be aware of telling the right story to specific stakeholders: technical metrics for IT and security professionals and business-related metrics for executives.
The Qualys platform’s clean and actionable data will provide helpful insights to consistently, constantly, and holistically reduce risk throughout your enterprise. As a result, your metrics program will demonstrate how the security team is winning the cyber battle.
By actively taking this data-driven approach and using the Qualys platform, your teams can confidently fine-tune the organization’s cyber defense approach, reduce the risk profile, and strengthen your organization’s security posture. Thus, you can elevate the organization’s cyber risk profile to the business by assisting in the attack surface reduction mission to make it the smallest footprint attainable.