US Production environment currently experiencing a service disruption
Incident Report for Zuora
Postmortem

Timeframe: 5:55pm PST September 14, 2017 - 6:45pm PST September 14, 2017

Affected Systems: US Production environment

Symptoms: Internal synthetic tests detected Zuora API and UI failures

Root Cause: A virtual server failed due to a hardware failure, causing a disruption of service in a subset of our systems. One of the systems that experienced the failure was unable to failover to the redundant, standby system due to a system lock. While this is an initial analysis, we are continuing our investigation into why this service was unable to failover appropriately.

Resolution: The impacted services were migrated to a different virtual host and service was restored.

Future Preventative Measures: Decrease the affinity in our virtualization environment for key component services, and improve resiliency of our high availability components.

Posted Sep 15, 2017 - 10:09 PDT

Resolved
Service has been fully restored, and an RCA will be forthcoming.
Posted Sep 14, 2017 - 19:05 PDT
Monitoring
Services are now being brought back online to restore service.
Posted Sep 14, 2017 - 18:45 PDT
Identified
We believe we have identified the root cause - and are in the process of restoring service.
Posted Sep 14, 2017 - 18:18 PDT
Investigating
We are experiencing a service disruption in our US Production environment. On-call engineers are engaged and restoring service, and we will update our Trust site as soon as we have more information or root cause.
Posted Sep 14, 2017 - 18:05 PDT
This incident affected: AMERICAS - CLOUD 2 (NA2) - www|rest.zuora.com (Production UI, Production API, Production Batch Operations, Production Analytics).