Infrastructure Resilience Briefing — April 1, 2021 Azure DNS Global Outage
A faulty Azure DNS change on April 1, 2021 caused global resolution failures for roughly two hours; Microsoft’s incident report stresses expedited rollback controls and reiterates the 2‑month SLA credit claim window for affected services.
Executive briefing: Microsoft attributed the Azure DNS outage to a flawed system update that propagated globally and broke name resolution for a wide range of Azure and Microsoft 365 endpoints for nearly two hours.1 The post-incident report emphasized rollback automation and confirmed that affected services remain eligible for service credits under their SLAs.
Regional impact
- Global reachability: DNS failures prevented clients from resolving application endpoints, effectively creating a control-plane and data-plane outage across regions.
- Dependency chain: Even workloads with healthy compute resources experienced downtime because health probes and service discovery relied on Azure DNS.
- Mitigation path: Microsoft halted the change, rolled back the faulty update, and ramped up traffic filtering to stabilize recursors.
SLA and credit posture
- Map downtime windows to the availability commitments of dependent services (e.g., Azure Front Door, App Service, Virtual Machines) to determine credit eligibility.2
- File claims within the 2-month SLA window, including evidence of DNS-related outages and the affected subscription scope.2
- Implement conditional DNS failover or secondary DNS providers for critical zones where regulatory posture allows, capturing lessons from the April 2021 outage.