Follow

Verbal Service Outage Analysis - July 22, 2019

Last updated 2019-08-19 20:56:36 UTC

Service Incident: AssureSign’s IVR provider experienced a connectivity break in one of their data centers in Boston resulting in a complete loss of IVR functionality to our customers.

Root Cause: Our IVR provider experienced a sustained packet loss of 60%-70% effectively bringing their data center down until consistent external internet connectivity was restored.

Discussion: On Monday night, July 22, at 7:32 PM EDT, our IVR provider’s main Boston data center experienced continuous packet loss of 60-70% on traffic to and from the external Internet.  Attempts to roll traffic to their redundant data centers were ineffective due to a simultaneous disruption in the National Telecom Database (SOMOS) as a result of a recent SOMOS mainframe cutover that caused routing change problems. That forced re-routing processes to be done manually which slowed re-routing and, moreover, impacted their ability to tune the configuration of telecom traffic to match the infrastructure assets of the secondary data centers.

Fix:  Ultimately the vendor of Internet connectivity made a change that addressed the problem successfully around 2 PM EDT on July 24th.  Packet loss ceased, normal Internet connectivity returned to their primary data center systems, and our provider restored network configurations to its normal operating settings.  Also, the National Telecom Database (SOMOS) conducted nighttime maintenance (7/26 to 7/27) to repair issues found as a result of their recent mainframe cutover, which should facilitate more rapid telecom re-routing in the future.

Next Steps:   Our IVR provider has contracted with an alternate provider of Internet connectivity in the Boston data center to assure further redundancy.   They have also saved and templated the routing and configuration settings that proved most effective in handling traffic without its primary data center. They have hired a new VP of Support as a result of the outage and have increased the level of communication to their customers by reconfiguring their support lines to allow for routing to any of their data centers via multiple carriers. They have made numerous adjustments to their Disaster Recovery Plan and added additional network redundancy.

Summary: We realize the impact this had to you, our customer, and also realize that while this outage was not due to our infrastructure failure it nonetheless reflects negatively on us. We deeply regret the loss of business continuity and pledge to continue to drive and strive for uninterrupted service to you. We appreciate the faith you have placed in us and appreciate your business.

Have more questions? Submit a request