← Back to all briefings

Infrastructure · Credibility 75/100 · · 2 min read

Infrastructure Resilience Briefing — April 1, 2021 Azure DNS Global Outage

A faulty Azure DNS change on April 1, 2021 caused global resolution failures for roughly two hours; Microsoft’s incident report stresses expedited rollback controls and reiterates the 2‑month SLA credit claim window for affected services.

Executive briefing: Microsoft attributed the Azure DNS outage to a flawed system update that propagated globally and broke name resolution for a wide range of Azure and Microsoft 365 endpoints for nearly two hours.1 The post-incident report emphasized rollback automation and confirmed that affected services remain eligible for service credits under their SLAs.

Regional impact

  • Global reachability: DNS failures prevented clients from resolving application endpoints, effectively creating a control-plane and data-plane outage across regions.
  • Dependency chain: Even workloads with healthy compute resources experienced downtime because health probes and service discovery relied on Azure DNS.
  • Mitigation path: Microsoft halted the change, rolled back the faulty update, and ramped up traffic filtering to stabilize recursors.

SLA and credit posture

  1. Map downtime windows to the availability commitments of dependent services (e.g., Azure Front Door, App Service, Virtual Machines) to determine credit eligibility.2
  2. File claims within the 2-month SLA window, including evidence of DNS-related outages and the affected subscription scope.2
  3. Implement conditional DNS failover or secondary DNS providers for critical zones where regulatory posture allows, capturing lessons from the April 2021 outage.
  • Microsoft Azure
  • Global DNS
  • Service credits
  • Change control
Back to curated briefings