TIMEFRAME:
2021/04/28 01:48 AM PDT - 2021/04/28 08:44 AM PDT
SUMMARY AND IMPACT:
During the above timeframe, a subset of customers on our NA Cloud Datacenter production environment experienced delays in processing selected batch operations (AQuA, Data Source Exports, Callouts and Z360 Sync)
ROOT CAUSE:
Replication delays in database topology caused by a combination of factors:
1. Heavy transactional activity between 1:00AM PDT to 3:00AM PDT.
2. Sub-optimal queries running in a subset of Datasource Exports and AQuA services.
3. High CPU utilization which slowed down transaction processing for the instance.
RESOLUTION:
Sub-optimal queries were stopped which improved CPU utilization and eliminated the replication latency.
FUTURE PREVENTATIVE MEASURES:
• Upgrade processing instance for for impacted database replica
• Improve retry mechanism for notification service
• Improve distribution of transaction load to other stand-by systems