Infrastructure Resilience Briefing — August 30, 2023 Azure Australia East Power Sag
A utility voltage sag in Azure’s Australia East region tripped datacenter chillers and caused widespread storage and VM failures; Microsoft’s RCA details phased recovery and reiterates the 2‑month window for filing SLA credit claims.
Executive briefing: On a utility voltage sag in the Australia East region shut down some cooling units, leading to thermal alarms, infrastructure shutdowns, and loss of storage availability across several availability zones.1 Microsoft’s preliminary post-incident review outlined staged recovery and reminded customers to request service credits under the affected service SLAs.
Regional impact
- Storage dependency: Azure Storage and managed disks in impacted clusters entered fail-safe states, cascading to virtual machine outages and delayed boot cycles.
- Availability zone imbalance: Workloads concentrated in a single zone faced the longest recovery times while unaffected zones maintained capacity.
- Business continuity: Customers with cross-region disaster recovery could fail over, but returning to Australia East required rebalancing once power and cooling stabilized.
SLA and credit posture
- Gather availability metrics for each impacted service (VMs, managed disks, storage) and compare against published SLA targets to quantify credit eligibility.2
- Submit service credit claims within 2 months of the outage, including timestamps, subscription IDs, and service diagnostics as required by Azure’s SLA policy.2
- Update continuity plans to prioritize cross-zone deployment and periodic controlled failovers out of Australia East to validate readiness.