A Look at Cyber Resilience
Explore how cyber resilience differs from traditional disaster recovery and learn practical frameworks for building organizational resilience. This guide covers essential concepts like RTO/RPO, maturity models, and proven standards from NIST and ISO to help you prepare for, respond to, and recover from cyber attacks while maintaining critical business operations.
Overview
In the olden days of computing it was a great sign of IT maturity if there was traditional Disaster Backup and Recovery (DBaR) and Business Continuity Planning (BCP). These concepts focused on preparing for and recovering from significant disruptions, such as natural disasters or major system failures. However, as cyber threats have evolved and become more sophisticated, the need for a more comprehensive approach to resilience has emerged.
In modern times, a cyber attack is probably more likely to cause disruption than a natural disaster. Also, cyber attacks have different kinds of problems from traditional disasters. For example, a ransomware attack may require not just restoring data from backups, but also dealing with data integrity issues, legal implications, and communication strategies. Let's take a look at how you can approach this challenge using the concept of cyber resilience. Cyber resilience is the ability of an organization to prepare for, respond to, and recover from cyber attacks while maintaining essential functions. It encompasses not only technical measures but also organizational processes, people, and culture.
TL;DR
Cyber Resilience goes beyond traditional disaster recovery. Modern organizations face cyber threats that require specialized preparation, response, and recovery strategies. We'll cover seven key frameworks (NIST CSF, SP 800-160/184/34/61, ISO 22301, ISO/IEC 27031), explain critical concepts like RTO and RPO, present a 6-level maturity model, and share practical guidance for building a comprehensive cyber resilience program. Whether you're starting from scratch or enhancing existing capabilities, we'll look at how to work with stakeholders, organize your program, report meaningful metrics to leadership, and maintain resilience through iterative testing and improvement.
Key Cyber Resilience Frameworks and Standards
One of the great things about modern times is that you can typically ask: "Are there any frameworks and/or standards around this topic?" and most-times there are! In the case of cyber resilience, there are several well-regarded frameworks and standards that organizations can leverage to enhance their cyber resilience posture. Here are some of the key ones:
1. NIST Cybersecurity Framework (CSF) 2.0
This covers high-level outcomes for Respond and Recover, tied to governance and risk. For example, it includes:
- Governance: Establishing policies and procedures for cyber resilience.
- Risk Management: Identifying and mitigating cyber risks.
- Respond: Developing and implementing response plans for cyber incidents.
- Recover: Establishing recovery plans to restore operations after a cyber incident.
2. NIST SP 800-160 Vol. 2
This covers resilient system engineering: segmentation, diversity, isolation, recovery patterns. For example, it includes:
- Segmentation: Dividing systems into segments to contain cyber threats.
- Diversity: Using diverse technologies and approaches to reduce vulnerabilities.
- Isolation: Implementing isolation techniques to protect critical systems.
- Recovery Patterns: Establishing recovery strategies to restore system functionality.
3. NIST SP 800-184
This covers cyber event recovery: playbooks, testing, sequencing, dependencies. For example, it includes:
- Playbooks: Developing detailed response plans for specific cyber events.
- Testing: Regularly testing recovery plans to ensure effectiveness.
- Sequencing: Establishing the order of recovery actions.
- Dependencies: Identifying and managing dependencies during recovery.
Link: https://csrc.nist.gov/publications/detail/sp/800-184/final
4. NIST SP 800-34
This covers contingency planning: business impact, RTO/RPO establishment, DR integration. For example, it includes:
- Business Impact Analysis: Assessing the impact of disruptions on business operations.
- Recovery Time Objectives (RTO): Defining acceptable downtime for systems.
- Recovery Point Objectives (RPO): Establishing acceptable data loss thresholds.
- Disaster Recovery Integration: Integrating contingency plans with disaster recovery strategies.
Link: https://csrc.nist.gov/publications/detail/sp/800-34/rev-1/final
5. NIST SP 800-61
This covers incident response lifecycle: prep > detect > contain > eradicate > recover > lessons learned. For example, it includes:
- Preparation: Establishing incident response capabilities.
- Detection and Analysis: Identifying and analyzing cyber incidents.
- Containment, Eradication, and Recovery: Implementing strategies to contain, eradicate, and recover from incidents.
- Post-Incident Activity: Conducting lessons learned to improve future response efforts.
6. ISO 22301
This covers business continuity for critical services: impact tolerances, continuity strategies. For example, it includes:
- Impact Tolerances: Defining acceptable levels of disruption for critical services.
- Continuity Strategies: Developing strategies to ensure the continuity of critical services during disruptions.
Link: https://www.iso.org/standard/75106.html and
Free Preview: https://www.iso.org/obp/ui/#iso:std:iso:22301:ed-2:v1:en
7. ISO/IEC 27031
This covers IT service continuity: practical guidance for ICT recovery and test cadence. For example, it includes:
- ICT Recovery: Establishing procedures for recovering IT services.
- Test Cadence: Implementing regular testing of IT service continuity plans.
Link: https://www.iso.org/standard/27031 and
Free Preview: https://www.iso.org/obp/ui/#iso:std:iso-iec:27031:ed-1:v1:en
Framework and Standards Summary
Put another way, these seven frameworks and standards cover the landscape of cyber resilience from different angles, giving organizations a pretty comprehensive toolkit to build and maintain their cyber resilience capabilities and cyber resilience program.
Here is another way to think of how these work together and complement each other to establish the domains of cyber resilience:
| Framework/Standard | Focus Area | Key Aspects |
|---|---|---|
| 1. NIST CSF 2.0 | Overall Cybersecurity | Governance, Risk Management, Respond, Recover |
| 2. NIST SP 800-160 Vol. 2 | System Resilience | Segmentation, Diversity, Isolation, Recovery |
| 3. NIST SP 800-184 | Cyber Event Recovery | Playbooks, Testing, Sequencing, Dependencies |
| 4. NIST SP 800-34 | Contingency Planning | Business Impact, RTO/RPO, DR Integration |
| 5. NIST SP 800-61 | Incident Response | Lifecycle Management, Lessons Learned |
| 6. ISO 22301 | Business Continuity | Impact Tolerances, Continuity Strategies |
| 7. ISO/IEC 27031 | IT Service Continuity | ICT Recovery, Test Cadence |
About RTO and RPO
Two important concepts in cyber resilience and disaster recovery are Recovery Time Objective (RTO) and Recovery Point Objective (RPO). In short, RTO is how long you can afford to be down and RPO is how much data you can afford to lose.
Recovery Time Objective (RTO)
RTO is the maximum acceptable amount of time that a system, application, or process can be down after a disruption before it must be restored to avoid unacceptable consequences. By defining this, it helps organizations prioritize recovery efforts and allocate resources effectively.
Again, RTO is about the amount of downtime that is acceptable.
Example: Technical Support Call Center
For example, take a technical support call center that has 300 employees. If the call center goes down, the organization risks losing customer trust, but it also might impact a Service Level Agreement (SLA) where the organization needs to pay a penalty. So, the downtime of the call center means lost trust, lost productivity, and potential financial penalties. Therefore, the organization might set an RTO of 4 hours for the call center to be back up and running.
Example: E-Commerce Site
Another example is an e-commerce website. If the website goes down, the organization risks losing sales and customer trust. So, the organization might set an RTO of 1 hour for the website to be back up and running. There may even be a financial figure associated too that they generally lose $X dollars per minute of downtime.
The idea is work with stakeholders to figure this out ahead of time, do the math, and determine what the RTO should be.
Recovery Point Objective (RPO)
RPO is the maximum acceptable amount of data loss measured in time. It defines the point in time to which data must be restored after a disruption to avoid unacceptable consequences. By defining RPO, organizations can determine appropriate backup strategies and data protection measures.
Again, RPO is about the amount of data loss that is acceptable.
Example: Financial Transactions Database For example, take a financial transactions database that processes thousands of transactions per minute. If the database goes down, the organization risks losing critical financial data. So, the organization might set an RPO of 15 minutes, meaning that in the event of a disruption, they can afford to lose up to 15 minutes worth of transaction data.
Example: Customer Relationship Management (CRM) System Another example is a customer relationship management (CRM) system that stores customer data and interactions. If the CRM system goes down, the organization risks losing valuable customer information. So, the organization might set an RPO of 1 hour, meaning that in the event of a disruption, they can afford to lose up to 1 hour worth of customer data.
Similar to RTO, the idea is to work with stakeholders to figure this out before an incident happens, do all the math, and determine what the RPO should be. Before an incident happens is the time to do this analysis! The organization could end up making significant infrastructure investments to meet very low RTOs and RPOs, so it's important to get this right.
Cyber Resilience Maturity
Cyber Resilience maturity refers to the level of an organization's ability to effectively prepare for, respond to, and recover from cyber attacks while maintaining essential functions. It encompasses not only technical measures but also organizational processes, people, and culture. A mature cyber resilience posture enables an organization to minimize the impact of cyber incidents, quickly restore operations, and continuously improve its resilience capabilities.
Below is a rough idea of how cyber resilience maturity levels might be defined:
Level 0 – Absence
Symptoms:
- No dedicated cyber-resilience people.
- Backups exist but are untested.
- No recovery playbooks.
- No defined Recovery Time Objective (RTO) or Recovery Point Objective (RPO).
- No understanding of what’s critical.
- No alignment between incident response and disaster recovery.
This is “hope is our strategy.”
Level 1 – Basic DR / scattered ownership
Symptoms:
- Some disaster recovery plans exist but are system-by-system.
- RTO/RPO are informal or unrealistic.
- Cyber scenarios (ransomware, identity compromise, cloud outage) are not modeled.
- Restores may happen once per year, lightly tested.
- Business Continuity Planning is separate from security and IT.
- Very document-heavy; little automation.
This is “classic DR but not cyber-aware.”
Level 2 – Cyber-aware DR
Symptoms:
- Company acknowledges that cyber incidents have different characteristics.
- Annual or semi-annual cyber tabletop exercises begin.
- Some cyber-specific runbooks appear (e.g., ransomware recovery).
- Backups are hardened (immutable copies, offline vaulting).
- Identity (Active Directory / cloud identity) begins to be treated as a Tier-0 asset.
Still fragmented, but the gap is visible.
Level 3 – Formal Cyber Resilience Function
Symptoms:
- A dedicated team or program now exists.
- Clear mapping of “critical business services.”
- RTO/RPO defined and agreed upon for critical systems.
- Regular restore validations (quarterly or monthly for key applications).
- Cyber recovery playbooks exist and are tested.
- Incident Response and Cyber Resilience share processes and learnings.
- Metrics flow up to leadership.
This is where most Fortune-100 companies are aiming to be.
Level 4 – Integrated Cyber Resilience & BCP
Symptoms:
- Cyber recovery and Business Continuity Planning operate as one ecosystem.
- Dependency mapping is accurate:
- identity > DNS > network > storage > application > business process.
- Recovery is orchestrated and measurable.
- Every critical service undergoes periodic DR tests including cyber-attack simulations.
- Risk and audit teams have visibility into resilience posture.
- Cloud and third-party resilience are assessed systematically.
This reduces major-outage probability dramatically.
Level 5 – Adaptive, Threat-Informed Cyber Resilience
Symptoms:
- Recovery testing is continuous and automated where possible.
- Detection systems feed into resilience patterns (MITRE ATT&CK mapping informs recovery engineering).
- Chaos engineering/chaos resilience testing is done on identity, cloud, and storage subsystems.
- Recovery strategies adapt based on threat landscape changes.
- Metrics are board-level and predictive, not descriptive.
- Company can withstand a full identity compromise, cloud region failure, or large-scale ransomware without existential risk.
This is extremely rare and very mature.
Working With Stakeholders
Within an organization, cyber resilience requires collaboration across multiple stakeholders, including IT, security, business units, and executive leadership. Hopefully, the organization already knows what the critical applications and services are. Meaning, if those apps and services go down, the organization is at risk of significant harm and/or going out of business. In Disaster Recovery Planning (DRP) and Business Continuity Planning (BCP), this is often called a Business Impact Analysis (BIA) and those apps are typically Tier-1 or Tier-0 applications.
Once you have a list of critical applications and services, you can work with the stakeholders to determine the RTO and RPO for each. This will help you prioritize recovery efforts and allocate resources effectively. Doing the analysis upfront to determine the RTO and RPO is incredibly helpful for planning, budgeting, and also when an incident occurs.
Organizing the Cyber Resilience Program
One of the bigger problems organizations face is how to organize the cyber resilience program. There is a tremendous amount of data to gather, analyze, and report on. Again, think in times of an incident, the cyber resilience tools and data will be very helpful to the incident response team.
Although a lot of data may end up being stored in Word docs and spreedsheets, the core RTO/RPO data and app inventory details can be stored in platforms like RSA Archer, ServiceNow GRC, or other GRC platforms. These platforms can help manage the data, workflows, and reporting needs.
Reporting Metrics to Leadership
A key role of cyber resilience programs is to be able to report metrics to leadership to demonstrate the organization's resilience posture and progress. Key Performance Indicators (KPIs) and Key Risk Indicators (KRIs) are essential for this purpose. So, what should we report?
Here are some examples of KPIs and KRIs that can be reported to leadership:
| Metric | Description |
|---|---|
| Percentage of Critical Systems with Tested Recovery Plans |
The percentage of critical systems that have recovery plans that have been tested within the last 6 months. |
| Average Recovery Time for Critical Systems |
The average time taken to recover critical systems during tests or actual incidents. |
| Number of Cyber Resilience Exercises Conducted |
The number of cyber resilience exercises (e.g., tabletop exercises, simulations) conducted in the last quarter. |
| Percentage of Backups Successfully Restored |
The percentage of backups that have been successfully restored during tests. |
| RTO/RPO Compliance Rate | The percentage of critical systems that meet their defined RTO and RPO objectives. |
| Number of Identified Resilience Gaps |
The number of gaps identified in the cyber resilience posture through assessments and audits. |
| Time to Detect and Respond to Cyber Incidents |
The average time taken to detect and respond to cyber incidents. |
Iterative Improvement
With all of this in mind we need to keep doing two things:
- Keep the Data Current: Make sure that the data we have about applications stays current as things change.
- Test, Test, and Test Again: Regularly test and update recovery plans to ensure they remain effective.
If the cyber resilience program either gets out of date -OR- is not properly tested, it can lead to a false sense of security and terrible outcomes during an actual incident. Instead, if the program is kept current and regularly tested, it can significantly enhance the organization's ability to withstand and recover from cyber attacks.

