Outage Analysis - May 2, 2019

Last updated 2019-07-08 17:23:48 UTC

At 4:00 PM EDT, on Thursday May 2, 2019, our alerting systems indicated issues connecting to the databases used in our IVR systems and Quality Assurance systems with primary instances located in the East US and East US 2 Azure data centers, and we initially updated our status portal to indicate these systems were experiencing issues. Our inability to manage and migrate instances was reported to Microsoft Azure support and our escalation contacts at Microsoft. Additional services housed in Azure West EU, Central Canada, Central US, South Central US, and West US then also became unresponsive, and none of our standard mechanisms for addressing regional failure points could mitigate the situation. 45 minutes after our reports to Microsoft they issued a status update indicating all global regions were affected by a "Network Infrastructure" issue.

At 4:58 PM EDT connectivity was restored, and access into all systems was confirmed. By 5:00 PM EDT electronic signing services were confirmed operational, IVR systems were confirmed operational, and all reporting and administrative sites were confirmed to be operating and serving customers.

During this outage, systems remained intact but DNS issues caused failures routing to the instances. We confirmed that no stored customer data has been lost. We are sorry for the inconvenience caused by these events, and are monitoring Microsoft's response.

Microsoft provided the following description of what is being called a "DNS Resolution" issue:

Engineers identified the underlying root cause as a nameserver delegation change affecting DNS resolution and resulting in downstream impact to Compute, Storage, App Service, AAD, and SQL Database services. During the migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated. No customer DNS records were impacted during this incident, and the availability of Azure DNS remained at 100% throughout the incident. The problem impacted only records for Microsoft services.

Microsoft has promised a root cause analysis report with additional information within 72 hours from Friday 12:00 AM EDT.