Bill Runs and Payment Runs failures for a subset of customers on our EU Datacentre
Incident Report for Zuora
Postmortem

DATE(S):
2020-01-15 1:55pm PST - 2020-01-15 5:05pm PST (2020-01-15 21:55 UTC - 2020-01-16 01:05 UTC)

SUMMARY:
Bill Runs and Payment Runs failures for a subset of customers on our EU Data Center.

ROOT CAUSE:
As part of a scheduled update to address operational enhancements with scheduled jobs (Bill Run, Payment Run, Subscription Renewal, etc.), a backward incompatible change was inadvertently introduced which would stop customers scheduled job from being executed if the original user who scheduled the job was no longer active in the tenant.

RESOLUTION:
We've fixed the issue by rolling back the change and followed by a permanent patch (with testing and validation) to prevent this issue from happening again in the future.

FUTURE PREVENTATIVE MEASURES:
Although this was a one time enhancement, we are also adjusting our monitoring and alerting thresholds to be able to better detect issues related to scheduled job getting stuck.

Posted Feb 13, 2020 - 13:25 PST

Resolved
This incident has been resolved.
Posted Jan 17, 2020 - 20:47 PST
Monitoring
This issue has now been mitigated and we continue to monitor it.
The impacted window was 2020-01-15T21:55:00UTC to 2020-01-16T01:05:00UTC. During this window some of the Jobs failed to start.

We are currently monitoring to ensure that further failures do not occur.
Posted Jan 17, 2020 - 11:11 PST
Update
We are continuing to investigate this issue.
Posted Jan 17, 2020 - 10:13 PST
Investigating
We have identified that Bill Runs are stuck for subset of our customers in Processing status.
Our internal teams are looking into this with priority.
Posted Jan 17, 2020 - 09:57 PST
This incident affected: EUROPE - CLOUD 1 (EU1) - *.eu.zuora.com (Production Batch Operations).