Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Backup and Recovery ensures data and configuration resilience for resources through systematic backup operations, protection of backup data, continuous monitoring, and regular recovery testing. Effective backup and recovery capabilities enable organizations to recover from data loss incidents, ransomware attacks, accidental deletions, and regional disasters while meeting regulatory requirements and business continuity objectives.
Without comprehensive backup and recovery capabilities:
- Permanent data loss: Unprotected resources face irreversible data loss from ransomware, accidental deletion, malicious insiders, or infrastructure failures compromising business operations.
- Extended downtime: Inability to recover critical workloads within acceptable timeframes disrupts business operations, revenue generation, and customer service delivery.
- Ransomware impact: Lack of protected backups forces organizations to pay ransom demands or accept permanent data loss when ransomware encrypts production data.
- Compliance violations: Failure to maintain recoverable backups creates regulatory audit failures, financial penalties, and potential sanctions.
- Recovery uncertainty: Untested backups may be incomplete, corrupted, or incompatible with recovery requirements when needed during actual disasters.
- Regional disaster vulnerability: Single-region backup storage creates complete data loss risk when primary and backup data reside in the same affected region.
Here are the three core pillars of the Backup and Recovery security domain.
Backup automation and coverage: Implement automated backup for all business-critical resources with appropriate frequency and retention ensuring comprehensive protection without manual intervention. Enforce backup policies at scale through governance frameworks preventing coverage gaps.
Related controls:
Backup data protection: Secure backup data and operations against unauthorized access, malicious deletion, ransomware encryption, and data exfiltration. Implement access controls, encryption, immutability, and redundancy protecting backup integrity.
Related controls:
Recovery readiness: Validate recovery capabilities through regular testing ensuring backup configurations, data availability, and recovery procedures meet defined Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) for business continuity.
Related controls:
- BR-4: Regularly test backup
BR-1: Ensure regular automated backups
Azure Policy: See Azure built-in policy definitions: BR-1.
Security principle
Implement automated backup for all business-critical resources ensuring consistent protection without manual intervention. Configure appropriate backup frequency and retention periods aligned with Recovery Point Objectives (RPO) and data retention requirements. Enforce backup policies through governance frameworks ensuring comprehensive coverage across resources.
Risk to mitigate
Organizations operating without systematic automated backup face significant data loss risks from various threat scenarios and operational failures. Without regular automated backups:
- Ransomware data loss: Ransomware attacks encrypt production data with no recovery path when backup copies are absent, corrupted, or also encrypted.
- Accidental deletion impact: Human errors including accidental resource deletion, configuration changes, or data purges cause permanent data loss without backup protection.
- Infrastructure failure data loss: Hardware failures, storage corruption, or regional outages result in complete data loss when backup copies do not exist.
- Malicious insider threats: Intentional data deletion or corruption by malicious insiders creates irreversible damage without independent backup copies.
- Application error data corruption: Software bugs, failed updates, or database corruption propagate across production systems without point-in-time recovery capability.
- Compliance requirement failures: Regulatory frameworks mandate data retention and recovery capabilities with audit failures when backups are missing or incomplete.
Manual backup processes create coverage gaps, inconsistent protection, and human error risks making data loss inevitable rather than preventable.
MITRE ATT&CK
- Impact (TA0040): data destruction (T1485) permanently deleting business-critical data, and data encrypted for impact (T1486) deploying ransomware without recovery options.
- Defense Evasion (TA0005): impair defenses (T1562) disabling backup services and deleting backup copies to prevent recovery.
- Persistence (TA0003): maintaining undetected access to systematically delete backups over time before executing destructive attacks.
BR-1.1: Enable automated backup for supported resources
Backup protection provides the ultimate recovery mechanism when all other security controls fail, enabling organizations to restore operations after ransomware attacks, data corruption, accidental deletion, or infrastructure failures that render primary data inaccessible. Automated backup configuration eliminates human error in protection deployment while ensuring consistent coverage as infrastructure scales dynamically in cloud environments. Recovery point objectives achieved through backup frequency directly determine maximum tolerable data loss, making backup configuration a critical business continuity decision rather than purely technical implementation.
Establish comprehensive automated protection through these backup capabilities:
- Enable Azure Backup for supported resources including Azure Virtual Machines, SQL Server, SAP HANA databases, Azure Database for PostgreSQL, Azure Files, Azure Disks, and Azure Blobs configuring automated backup schedules aligned with business requirements.
Backup configuration best practices:
- Deploy Azure Backup across resources: Enable Azure Backup on all supported business-critical resources including VMs, databases, file shares, and storage accounts ensuring comprehensive protection without coverage gaps.
- Configure backup frequency: Define backup frequency based on data change rates and RPO requirements using hourly backups for high-transaction databases and daily backups for less frequently changing data.
- Define retention policies: Establish retention periods meeting regulatory requirements and business needs balancing long-term retention with storage costs typically 30-90 days for operational recovery and 7+ years for compliance.
- Implement instant restore: Enable instant restore capability for Azure VM backups allowing rapid recovery from snapshot-based restore points without waiting for full backup restoration.
- Configure backup windows: Schedule backup operations during low-activity periods minimizing performance impact on production workloads while ensuring backup completion before next cycle.
Backup coverage strategy:
- Virtual machine protection: Enable Azure Backup for all production Azure VMs with application-consistent backups for Windows and file-system consistent backups for Linux capturing complete system state.
- Database backup: Configure automated backup for Azure SQL Database, Azure Database for PostgreSQL, SQL Server on Azure VMs, and SAP HANA databases with transaction log backups for point-in-time recovery.
- Storage account protection: Enable Azure Backup for Azure Files, configure blob versioning and soft delete for Azure Blob Storage, and implement operational backup for blobs requiring frequent restore.
- Disk-level protection: Enable Azure Disk Backup for managed disks requiring independent backup from VM-level protection providing granular recovery options.
BR-1.2: Implement backup for unsupported resources
Relying exclusively on Azure Backup's supported resource list creates protection gaps for critical infrastructure components including Key Vault secrets, container images, Cosmos DB data, and custom application configurations that lack native backup integration. Native protection features embedded in Azure services (blob versioning, soft delete, point-in-time restore) often provide superior recovery capabilities tailored to specific workload characteristics compared to generic backup approaches. Custom backup automation ensures organizations maintain comprehensive protection across their entire technology stack rather than accepting data loss risk for unsupported components.
Extend protection to all critical resources through these approaches:
- Implement native backup capabilities or custom backup solutions for resources not supported by Azure Backup ensuring comprehensive protection across all business-critical services.
Native backup implementation:
- Enable Azure Key Vault backup: Implement Azure Key Vault native backup for secrets, keys, and certificates establishing automated export and secure storage of cryptographic materials.
- Configure storage account features: Enable blob versioning, soft delete, and point-in-time restore for Azure Storage accounts providing native data protection without separate backup infrastructure.
- Implement container registry backup: Enable geo-replication for Azure Container Registry and implement automated image export to secondary storage ensuring container image recovery capability.
- Configure Cosmos DB backup: Enable continuous backup mode for Azure Cosmos DB accounts providing point-in-time restore capability with 30-day retention or configure periodic backup with customizable intervals.
Custom backup solutions:
- Export configuration as code: Export Azure resource configurations to Azure Resource Manager templates, Terraform configurations, or Bicep files storing in version-controlled repositories enabling infrastructure recovery.
- Implement application-level backup: Design application-specific backup mechanisms for resources lacking native backup including data export scripts, configuration snapshots, and state preservation.
- Configure Azure Automation: Create Azure Automation runbooks for custom backup workflows automating resource configuration export, data snapshots, and backup validation for unsupported resources.
- Establish service-specific procedures: Document and automate backup procedures for each resource type without native support ensuring consistent protection and recovery capability.
BR-1.3: Enforce backup policies through governance
Manual backup configuration creates persistent coverage gaps as new resources deploy continuously in dynamic cloud environments, with unprotected resources remaining vulnerable until human intervention occurs-often discovered only after data loss incidents. Policy-driven enforcement transforms backup from reactive configuration into proactive governance that automatically protects new resources at creation time while continuously monitoring and remediating existing resources that drift from compliance. Centralized policy management ensures consistent protection standards across distributed teams and subscriptions where manual processes inevitably produce configuration inconsistencies.
Automate backup protection through policy-driven governance:
- Implement Azure Policy to enforce automated backup on new and existing resources ensuring consistent protection across subscriptions without manual configuration.
Policy-based backup enforcement:
- Deploy built-in backup policies: Assign Azure Policy definitions including "Configure backup on virtual machines" and "Azure Backup should be enabled for Virtual Machines" ensuring automatic compliance.
- Configure automatic remediation: Enable automatic remediation on backup policies ensuring non-compliant resources are automatically configured with appropriate backup protection.
- Define policy assignment scope: Apply backup policies at management group or subscription level providing centralized governance across multiple subscriptions and resource groups.
- Implement compliance monitoring: Configure Azure Policy compliance dashboards tracking backup coverage across resources identifying gaps requiring attention.
Backup governance framework:
- Establish backup standards: Define organizational backup standards specifying required frequency, retention, and protection levels for different resource classifications and criticality tiers.
- Create resource tagging strategy: Implement resource tags indicating backup requirements, retention periods, and recovery priorities enabling automated policy application based on metadata.
- Configure policy exemptions: Establish formal exception process for resources requiring non-standard backup configurations with documented business justification and compensating controls.
- Monitor policy effectiveness: Review policy compliance reports regularly identifying policy gaps, exemption abuse, and opportunities for governance improvement.
Implementation example
A financial services organization faced regulatory requirements for data retention and business continuity while managing rapid cloud expansion with thousands of new resources deployed monthly. Manual backup processes created coverage gaps and compliance risks.
Challenge: Trading systems required aggressive 12-hour RPO, regulatory data needed 7-year retention, and rapid resource provisioning outpaced manual backup configuration leaving new VMs unprotected.
Solution approach:
- Automated protection for supported services: Deployed Azure Backup for 2,000+ VMs with twice-daily backups meeting 12-hour RPO. Configured Azure SQL Database with 35-day retention and Azure Cosmos DB continuous backup providing 5-minute granularity for trading data.
- Native protection for storage: Enabled blob versioning and soft delete for Azure Storage accounts, leveraging native capabilities instead of separate backup infrastructure.
- Policy-based enforcement: Implemented Azure Policy with automatic remediation ensuring production resources receive protection immediately upon creation. Established "BackupTier" tagging strategy (Gold/Silver/Bronze) automating policy assignment by criticality.
- Automated solutions for unsupported resources: Created Azure Automation runbooks for Azure Key Vault secrets and firewall configurations with 7-year retention.
Outcome: The organization achieved complete production coverage with automated backup protection deployed immediately upon resource creation, eliminating manual configuration delays. Policy-based enforcement ensured consistent compliance while automated remediation addressed gaps without manual intervention.
Criticality level
Must have.
Control mapping
Control mapping
- NIST SP 800-53 Rev.5: CP-9, CP-9(1), CP-9(3), CP-9(5), CP-10(2)
- PCI-DSS v4: 12.10.1, 12.10.4
- CIS Controls v8.1: 11.1, 11.2, 11.3
- NIST CSF v2.0: PR.IP-4, RC.RP-1
- ISO 27001:2022: A.8.13
- SOC 2: CC5.1, A1.2
BR-2: Protect backup and recovery data
Azure Policy: See Azure built-in policy definitions: BR-2.
Security principle
Protect backup data and operations through multi-layered security controls including access restrictions, encryption, immutability, and geographic redundancy. Implement defense-in-depth protecting backup infrastructure from ransomware, malicious deletion, unauthorized access, and regional disasters ensuring recovery capability when needed.
Risk to mitigate
Organizations failing to protect backup data face threats from ransomware, malicious insiders, accidental deletion, and unauthorized access compromising recovery capability. Without backup protection:
- Ransomware backup encryption: Advanced ransomware targets backup systems encrypting or deleting backup copies eliminating recovery options forcing ransom payment or permanent data loss.
- Malicious backup deletion: Attackers with compromised credentials delete backup copies before executing destructive attacks preventing incident recovery and maximizing damage.
- Insider threat data exfiltration: Malicious insiders with backup access exfiltrate sensitive data through backup systems bypassing production data access controls and monitoring.
- Accidental backup corruption: Unauthorized configuration changes, accidental deletion, or improper backup management corrupt backup data rendering recovery impossible during emergencies.
- Unauthorized backup access: Inadequate access controls allow unauthorized users to restore, modify, or delete backup data creating security and compliance violations.
- Regional disaster vulnerability: Backup data stored only in primary region becomes unavailable during regional disasters preventing recovery when most needed.
Unprotected backup data represents single point of failure eliminating data protection value when backups themselves become compromise targets.
MITRE ATT&CK
- Impact (TA0040): inhibit system recovery (T1490) deleting backup copies preventing restoration after ransomware attacks.
- Defense Evasion (TA0005): indicator removal (T1070) and impair defenses (T1562) disabling backup monitoring and deleting backup logs.
- Credential Access (TA0006): steal application access token (T1528) compromising backup service accounts to access and corrupt backup data.
- Collection (TA0009): data from cloud storage (T1530) exfiltrating sensitive data through backup systems bypassing production access controls.
BR-2.1: Secure backup access and operations
Backup infrastructure becomes a prime target for sophisticated adversaries who understand that destroyed backups eliminate recovery options following ransomware attacks or destructive malware deployment. Privilege access controls, multi-factor authentication, and soft delete capabilities transform backup systems from passive data storage into actively defended critical infrastructure that maintains availability even during compromise attempts. Audit logging and alerting enable security teams to detect backup tampering patterns before adversaries execute destructive attacks, providing critical early warning of advanced persistent threats.
Defend backup infrastructure through these security controls:
- Implement access controls, authentication, and audit logging for backup operations protecting against unauthorized access and malicious activity.
Access control configuration:
- Implement Azure RBAC for backup: Assign Azure role-based access control roles including Backup Contributor, Backup Reader, and Backup Operator segregating duties and enforcing least privilege access to backup operations.
- Require multi-factor authentication: Enforce multi-factor authentication for critical backup operations including restore, retention changes, backup deletion, and Recovery Services vault configuration preventing unauthorized access.
- Enable Azure Private Link: Configure private endpoints for Recovery Services vaults restricting backup traffic to private networks preventing backup data exfiltration over public internet.
- Implement just-in-time access: Use Microsoft Entra Privileged Identity Management for time-bound backup administrator access requiring approval workflows and business justification for elevated permissions.
Backup operation protection:
- Enable MFA for backup deletion: Configure security PIN requirements for backup deletion operations requiring PIN generated in Azure portal preventing automated malicious deletion.
- Implement soft delete: Enable soft delete for Recovery Services vaults retaining deleted backup data for 14 days allowing recovery from accidental or malicious deletion before permanent removal.
- Configure audit logging: Enable Azure Monitor logging for all backup operations tracking backup creation, deletion, restore, and configuration changes for security monitoring and compliance.
- Establish alert rules: Create Azure Monitor alerts for critical backup events including backup failures, unauthorized restore operations, retention policy changes, and soft delete disablement.
BR-2.2: Encrypt backup data
Unencrypted backups expose sensitive organizational data to unauthorized access through compromised storage, lost backup media, or malicious insiders with infrastructure permissions who lack legitimate business access to production data. Encryption transforms backup data from readable information into cryptographically protected ciphertext, ensuring confidentiality even when storage controls fail. Customer-managed encryption keys provide additional protection against cloud provider compromise scenarios while meeting regulatory requirements for cryptographic control, though they introduce key management complexity requiring documented recovery procedures.
Protect backup data confidentiality through encryption:
- Implement encryption for backup data at rest and in transit protecting confidentiality and meeting regulatory requirements.
Encryption configuration:
- Enable platform-managed encryption: Azure Backup automatically encrypts backup data using AES-256 encryption with platform-managed keys requiring no additional configuration for baseline protection.
- Implement customer-managed keys: Configure customer-managed keys in Azure Key Vault for backup encryption providing organizational control over encryption keys and meeting specific compliance requirements.
- Protect encryption keys: Enable soft delete and purge protection for Azure Key Vault storing backup encryption keys preventing key deletion and ensuring backup recoverability.
- Encrypt on-premises backups: Configure passphrase-based encryption for on-premises backups using Azure Backup agent protecting data during transit and storage in Azure.
Key management best practices:
- Include keys in backup scope: Ensure customer-managed keys used for backup encryption are themselves protected through Azure Key Vault backup preventing key loss scenarios.
- Implement key rotation: Establish key rotation policies for customer-managed encryption keys balancing security requirements with operational complexity and backup compatibility.
- Monitor key access: Enable Azure Key Vault logging tracking encryption key access, usage, and administrative operations detecting unauthorized key access attempts.
- Document key recovery: Maintain documented procedures for encryption key recovery and backup decryption ensuring business continuity during key management incidents.
BR-2.3: Implement backup immutability and redundancy
Mutable backups that adversaries can delete or corrupt provide false confidence in recovery capabilities, with ransomware attackers specifically targeting backup systems before executing encryption to eliminate recovery options and force ransom payment. Immutability transforms backups from modifiable data stores into write-once storage that maintains recovery points regardless of administrative access or credential compromise. Geographic redundancy protects against regional disasters, datacenter failures, and localized security incidents that could destroy both production systems and co-located backups simultaneously.
Ensure backup data integrity and availability through immutability:
- Configure immutable backup storage and geographic redundancy protecting against ransomware, corruption, and regional disasters.
Immutability configuration:
- Enable immutable vault: Configure immutable vaults for Recovery Services vaults preventing backup deletion, retention reduction, and soft delete disablement for specified lock periods protecting against ransomware.
- Configure vault lock periods: Define minimum retention lock periods based on regulatory requirements typically 180 days or longer ensuring backup data cannot be prematurely deleted.
- Implement multi-user authorization: Require multi-user authorization for immutability configuration changes preventing single administrator from weakening backup protection.
- Monitor immutability status: Track immutability configuration across Recovery Services vaults alerting on attempts to disable protection or reduce retention periods.
Geographic redundancy:
- Enable cross-region restore: Configure geo-redundant storage (GRS) for Recovery Services vaults automatically replicating backup data to Azure paired regions enabling recovery during regional disasters.
- Implement zone-redundant storage: Enable zone-redundant storage (ZRS) for Recovery Services vaults protecting backup data from datacenter-level failures within regions supporting availability zones.
- Test cross-region recovery: Periodically validate cross-region restore capability ensuring backup data availability and restore procedures work correctly in disaster scenarios.
- Document failover procedures: Maintain documented procedures for cross-region restore including authentication, permissions, and recovery steps ensuring business continuity during regional outages.
Implementation example
A healthcare organization experienced ransomware attacks targeting backup systems and faced HIPAA compliance requirements for protecting electronic health records across geographically distributed medical facilities.
Challenge: Ransomware attackers were deleting backups before encryption, HIPAA required specific encryption controls, and regional disaster scenarios threatened data availability for critical patient care systems.
Solution approach:
- Access control and authentication: Implemented Azure RBAC segregating operations between Backup Operators (daily tasks) and Contributors (policy changes). Enabled MFA and security PIN for deletion operations preventing automated malicious actions.
- Encryption and compliance: Configured customer-managed keys in Azure Key Vault meeting HIPAA requirements for organizational cryptographic control.
- Immutability and ransomware defense: Enabled immutable vaults with 365-day retention lock preventing deletion even by administrators. Configured 90-day soft delete providing extended recovery window.
- Network isolation: Implemented Azure Private Link eliminating public internet exposure for backup traffic. Configured alerts detecting unauthorized restore attempts providing early ransomware attack indicators.
Outcome: The organization successfully defended against ransomware attempts where backups remained intact and recoverable despite compromised administrative credentials. Customer-managed encryption keys and multi-factor authentication prevented unauthorized access to backup data during security incidents.
Criticality level
Must have.
Control mapping
Control mapping
- NIST SP 800-53 Rev.5: CP-9(8), SC-12(1), SC-13, SC-28, SC-28(1)
- PCI-DSS v4: 3.5.1, 10.5.1, 12.3.4
- CIS Controls v8.1: 11.3, 11.5, 3.11
- NIST CSF v2.0: PR.DS-1, PR.DS-5, PR.IP-4
- ISO 27001:2022: A.8.13, A.8.24, A.5.14
- SOC 2: CC6.1, CC6.7, A1.2
BR-3: Monitor backups
Security principle
Implement continuous monitoring of backup operations, coverage, and compliance ensuring all business-critical resources maintain protection meeting defined standards. Monitor backup health, detect failures, and alert on anomalies enabling rapid response to backup issues before they impact recovery capability.
Risk to mitigate
Organizations failing to monitor backup operations and compliance lack visibility into backup failures, coverage gaps, and policy violations creating false security assumptions. Without backup monitoring:
- Silent backup failures: Backup jobs fail without detection leaving resources unprotected with outdated or missing backup copies discovered only during recovery attempts.
- Coverage gaps: New resources deployed without backup protection remain vulnerable to data loss while appearing in asset inventories suggesting comprehensive protection.
- Configuration drift: Backup policies and retention settings change through unauthorized modifications weakening protection without visibility or alerting.
- Compliance violations: Resources missing required backup protection create regulatory audit failures and penalties discovered only during compliance assessments.
- Capacity issues: Backup storage capacity exhaustion prevents new backups from succeeding causing silent protection degradation across resources.
- Security incidents: Unauthorized backup access, deletion, or configuration changes occur without detection indicating potential security compromises.
Lack of backup monitoring transforms backup systems into false sense of security where protection exists on paper but fails in reality.
MITRE ATT&CK
- Defense Evasion (TA0005): impair defenses (T1562) disabling backup monitoring to hide malicious activity targeting backup systems.
- Impact (TA0040): inhibit system recovery (T1490) silently corrupting or deleting backups over time before executing destructive attacks.
BR-3.1: Monitor backup health and operations
Backup systems failing silently create false confidence in recovery capabilities until disaster scenarios reveal months of unsuccessful backup attempts, making continuous health monitoring essential to validate protection effectiveness. Centralized backup monitoring aggregates status across distributed infrastructure enabling proactive failure remediation before data loss windows exceed recovery point objectives. Performance tracking identifies backup infrastructure scaling requirements and degradation patterns that indicate system stress or malicious interference before complete backup failures occur.
Monitor backup system reliability through centralized observability:
- Implement centralized monitoring of backup operations tracking job status, failures, and performance ensuring backup reliability.
Backup health monitoring:
- Enable Azure Backup reports: Configure Azure Backup Reports using Log Analytics workspace providing centralized visibility into backup jobs, storage consumption, and protected items across subscriptions.
- Implement Backup Center: Use Azure Backup Center for unified backup management and monitoring providing single interface for backup estate governance across Recovery Services vaults.
- Configure job monitoring: Track backup job completion status, duration, and failure rates identifying performance degradation and reliability issues requiring investigation.
- Monitor storage consumption: Track backup storage growth and capacity utilization forecasting storage requirements and preventing capacity exhaustion impacting backup success.
Alert configuration:
- Configure critical failure alerts: Create Azure Monitor alerts for backup job failures, snapshot failures, and replication errors ensuring immediate notification of protection issues.
- Implement health alerts: Configure alerts for Recovery Services vault health issues including connectivity problems, authentication failures, and service degradation.
- Define alert routing: Establish alert routing rules directing backup notifications to appropriate teams based on severity, resource type, and organizational structure.
- Set alert thresholds: Define acceptable failure rates and alert thresholds avoiding alert fatigue while ensuring critical issues receive immediate attention.
BR-3.2: Monitor backup compliance and coverage
Health monitoring detects backup system failures but validating protection compliance requires tracking which resources maintain required backup coverage versus organizational policies and regulatory requirements. Compliance monitoring identifies resources that bypass or lose backup protection creating data loss risks that escalate until discovered through audit findings or disaster scenarios. Automated compliance reporting transforms manual auditing into continuous validation that catches coverage gaps immediately rather than discovering missing backups when recovery becomes necessary.
Validate protection compliance through continuous monitoring:
- Implement compliance monitoring ensuring all business-critical resources maintain required backup protection meeting organizational policies.
Compliance monitoring:
- Leverage Azure Policy compliance: Monitor Azure Policy compliance dashboards tracking resources with missing or misconfigured backup protection identifying coverage gaps.
- Implement backup coverage reports: Generate regular reports showing backup protection status across resource types, subscriptions, and resource groups quantifying coverage percentages.
- Track policy exemptions: Monitor backup policy exemptions ensuring documented business justification and regular review preventing exemption abuse weakening protection.
- Audit configuration changes: Track backup configuration changes including retention policy modifications, backup schedule adjustments, and protection disablement identifying unauthorized changes.
Operational monitoring:
- Monitor last backup age: Track time since last successful backup for each protected resource identifying stale backups indicating protection degradation or service issues.
- Review recovery point objectives: Compare actual backup frequency against defined RPO requirements identifying resources failing to meet business continuity objectives.
- Track backup consistency: Monitor application-consistent backup success rates for VMs and databases ensuring backup quality meets recovery requirements beyond file-level consistency.
- Identify unprotected resources: Regularly scan subscriptions for business-critical resources without backup protection using resource tagging and classification identifying coverage gaps.
Implementation example
A retail organization operating global e-commerce platform discovered 50 unprotected production VMs during audit and experienced silent backup failures causing 3-day data loss before detection.
Challenge: Rapid cloud expansion with thousands of protected items across multiple Azure regions created visibility challenges. Silent backup failures went undetected, and audit revealed significant coverage gaps threatening business continuity and compliance.
Solution approach:
- Centralized visibility: Deployed Azure Backup Center with unified view across numerous vaults in multiple regions. Implemented Azure Backup Reports tracking job success rates and storage trends.
- Proactive alerting: Configured Azure Monitor alerts routing failures to on-call team and flagging jobs exceeding 6-hour duration as early warning signals.
- Compliance monitoring: Leveraged Azure Policy dashboards and automated weekly reports showing coverage by business unit.
- Configuration protection: Implemented alerts requiring approval for retention reductions or protection disablement on critical resources.
Outcome: The organization dramatically reduced backup failure detection time through proactive alerting and centralized monitoring. Comprehensive compliance monitoring eliminated audit findings related to unprotected business-critical resources while enabling storage optimization through identification of obsolete backups.
Criticality level
Should have.
Control mapping
Control mapping
- NIST SP 800-53 Rev.5: CP-9(1), SI-4, AU-6, AU-7
- PCI-DSS v4: 10.4.1, 10.6.2, 12.10.5
- CIS Controls v8.1: 8.2, 8.11, 11.2
- NIST CSF v2.0: DE.AE-3, DE.CM-1, RS.AN-1
- ISO 27001:2022: A.8.13, A.8.16
- SOC 2: CC7.2, A1.2
BR-4: Regularly test backup
Security principle
Periodically validate backup configurations and recovery procedures through structured testing ensuring backup data integrity and recovery capability meet defined Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO). Test recovery procedures at appropriate frequency balancing operational impact with recovery confidence.
Risk to mitigate
Organizations neglecting regular backup testing discover backup inadequacies only during actual disasters when recovery fails. Without backup testing:
- Incomplete backup configurations: Backup jobs complete successfully but capture incomplete data sets missing critical components discovered only during recovery attempts causing extended downtime.
- Recovery procedure failures: Documented recovery procedures contain errors, missing steps, or incorrect commands failing during high-pressure disaster scenarios when mistakes are costly.
- RTO/RPO violations: Actual recovery time significantly exceeds defined objectives due to unexpected complications, infrastructure limitations, or procedural inefficiencies discovered during testing.
- Corrupted backup data: Backup data contains corruption, inconsistencies, or errors rendering recovery impossible despite successful backup job completion and monitoring.
- Skills and knowledge gaps: Staff lack practical recovery experience leading to errors, delays, and poor decisions during actual disaster recovery when expertise is critical.
- Dependency identification failures: Application dependencies and configuration requirements unknown until recovery attempt causing cascading failures and extended recovery time.
Untested backups represent theoretical protection with unknown reliability creating false confidence in recovery capability until disaster proves otherwise.
MITRE ATT&CK
- Impact (TA0040): data destruction (T1485) and inhibit system recovery (T1490) causing maximum damage when untested backup configurations fail during recovery.
BR-4.1: Implement backup recovery testing
Backup systems validating data capture but never testing recovery create unverified assumptions about restoration capabilities that fail catastrophically during actual disasters when backup corruption, configuration errors, or procedure gaps prevent successful recovery. Regular recovery testing transforms theoretical backup protection into validated capability by identifying integrity issues, procedure deficiencies, and infrastructure limitations before critical business incidents. Measuring actual recovery time against business requirements ensures recovery objectives remain achievable as systems evolve rather than discovering missed targets during production outages.
Validate backup effectiveness through structured recovery testing:
- Establish structured backup recovery testing program validating data integrity, recovery procedures, and time objectives.
Recovery testing strategy:
- Define testing scope: Establish recovery testing scope including full system recovery for Tier 1 applications, database recovery for Tier 2 applications, and file-level recovery for Tier 3 resources balancing thoroughness with operational impact.
- Schedule regular tests: Conduct recovery tests quarterly for critical systems, semi-annually for standard systems, and annually for less critical systems ensuring regular validation without excessive operational burden.
- Test different recovery scenarios: Validate multiple recovery scenarios including point-in-time restore, cross-region failover, individual file recovery, and complete system rebuild from backup.
- Document test results: Record recovery test outcomes including success status, actual recovery time, data validation results, and issues identified creating knowledge base for improvement.
Recovery procedure validation:
- Restore to non-production: Perform recovery tests in isolated non-production environments preventing production impact while validating backup integrity and recovery procedures.
- Validate data integrity: Verify restored data completeness and consistency including database integrity checks, file count validation, and application functionality testing confirming backup quality.
- Measure recovery time: Track actual recovery time for each test comparing against defined RTO requirements identifying procedure inefficiencies and infrastructure bottlenecks.
- Test recovery point: Validate backup capture point comparing restored data against expected state ensuring RPO requirements are met and no data loss occurs during recovery.
BR-4.2: Validate disaster recovery capabilities
Individual backup recovery tests validate technical capability but disaster scenarios require coordinated recovery of multiple interdependent systems with complex dependencies that single-system testing cannot validate. End-to-end disaster recovery testing reveals organizational readiness gaps including team coordination failures, communication breakdowns, and undocumented dependencies that prevent successful recovery despite technically sound backups. Tabletop exercises, failover drills, and business continuity validation ensure teams can execute coordinated recovery under stress rather than discovering procedural gaps during actual disasters when time pressures amplify errors.
Validate organizational disaster preparedness through comprehensive exercises:
- Test end-to-end disaster recovery scenarios validating organizational readiness for major incidents requiring complete system recovery.
Disaster recovery testing:
- Conduct tabletop exercises: Perform tabletop disaster recovery exercises simulating various disaster scenarios validating team coordination, decision-making processes, and communication procedures.
- Execute failover drills: Test cross-region failover capabilities activating backup infrastructure in secondary regions validating geo-redundancy effectiveness and recovery procedures.
- Validate business continuity: Ensure recovered systems support business operations testing application functionality, user access, integration points, and performance requirements.
- Test recovery orchestration: Validate recovery runbooks, automation scripts, and orchestration workflows ensuring smooth execution during actual disasters without manual intervention errors.
Continuous improvement:
- Document lessons learned: Capture issues, inefficiencies, and improvement opportunities identified during recovery testing creating action items for procedure enhancement.
- Update recovery procedures: Incorporate lessons learned from recovery tests into documented procedures ensuring continuous improvement of recovery capability.
- Train recovery teams: Use recovery testing as training opportunity ensuring team members gain practical experience with recovery procedures reducing errors during actual disasters.
- Refine RTO/RPO targets: Adjust RTO and RPO objectives based on actual recovery capabilities identified through testing ensuring business expectations align with technical reality.
Implementation example
A financial services organization assumed their Azure backup strategy was adequate until a ransomware incident affecting their Azure SQL databases and Azure App Services revealed critical gaps in recovery procedures and significantly longer restoration times than expected.
Challenge: Untested backup configurations, undocumented recovery procedures, and unfamiliar operations team resulted in extended downtime during security incident. Business continuity plans proved unrealistic when actual recovery capabilities were tested under pressure.
Solution approach:
- Structured testing program: Established quarterly Azure SQL Database and Azure App Service recovery tests validating complete restoration. Documented actual recovery time revealing gaps preventing target RTO achievement.
- Incremental recovery validation: Performed monthly Azure Files share-level restore tests confirming rapid recovery capability. Validated Azure Cosmos DB point-in-time restore granularity for transaction data.
- Disaster recovery scenarios: Executed Azure Site Recovery failover tests and full infrastructure restoration to isolated environments validating backup completeness and application dependencies.
- Team readiness: Trained operations team through quarterly hands-on recovery drills using Azure Backup and Azure Site Recovery, substantially reducing average recovery time through improved familiarity with Azure recovery tools.
- Continuous improvement: Documented numerous improvements from testing including Azure Automation runbook opportunities and documentation gaps. Updated runbooks with automated database refresh and application redeployment scripts.
Outcome: The organization significantly reduced Azure workload recovery time through automation developed during testing exercises. Regular testing revealed unrealistic recovery objectives which were adjusted to achievable targets, ensuring business continuity plans reflected operational reality.
Criticality level
Should have.
Control mapping
Control mapping
- NIST SP 800-53 Rev.5: CP-4, CP-4(1), CP-9(7), CP-10
- PCI-DSS v4: 12.10.6
- CIS Controls v8.1: 11.4, 11.5
- NIST CSF v2.0: PR.IP-9, RC.RP-1
- ISO 27001:2022: A.5.30, A.8.13
- SOC 2: A1.3, CC9.1