Performance degradation impacting reports/exports/Z360 sync/notifications for a sub-set of NA Cloud tenants
Incident Report for Zuora
Postmortem

TIMEFRAME:
2021/04/28 01:48 AM PDT - 2021/04/28 08:44 AM PDT

SUMMARY AND IMPACT:
During the above timeframe, a subset of customers on our NA Cloud Datacenter production environment experienced delays in processing selected batch operations (AQuA, Data Source Exports, Callouts and Z360 Sync)

ROOT CAUSE:
Replication delays in database topology caused by a combination of factors:

1. Heavy transactional activity between 1:00AM PDT to 3:00AM PDT.
2. Sub-optimal queries running in a subset of Datasource Exports and AQuA services.
3. High CPU utilization which slowed down transaction processing for the instance.

RESOLUTION:
Sub-optimal queries were stopped which improved CPU utilization and eliminated the replication latency.

FUTURE PREVENTATIVE MEASURES:
• Upgrade processing instance for for impacted database replica
• Improve retry mechanism for notification service
• Improve distribution of transaction load to other stand-by systems

Posted May 12, 2021 - 11:05 PDT

Resolved
This incident has been resolved.
Posted Apr 29, 2021 - 05:51 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Apr 28, 2021 - 08:48 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted Apr 28, 2021 - 08:12 PDT
Update
We are continuing to investigate this issue.
Posted Apr 28, 2021 - 07:08 PDT
Update
We are continuing to investigate this issue.
Posted Apr 28, 2021 - 05:09 PDT
Update
We are continuing to investigate this issue.
Posted Apr 28, 2021 - 04:04 PDT
Investigating
We are currently investigating this issue.
Posted Apr 28, 2021 - 03:35 PDT
This incident affected: AMERICAS - CLOUD 1 (NA1) - *.na.zuora.com (Production Integrations, Production Batch Operations, Production Analytics).