LESSONS LEARNT:
What was the context ?
- The update that has been performed was part of the last steps to complete our worldwide multi-Data Center deployment. The object is a continuous quality improvement and legal frameworks (GDPR, Data Privacy Policies).
- The 1.49 patch brings new features and corrections. One of the updated element was a new software module on a specific Database in order to improve the mobile response times thanks to a new algorithm.
What happened ?
- The update should have been seamless and transparent. The issue was that one of the modules expected a database to be indexed (where it was not).
- With the absence of indexed entries, servers were slow to respond, due to massive amount of user requests, causing a kind of deny of service.
What we learnt and what we solved?
- This kind of issues that can happen when manipulating large numbers of entries in Database has been fixed is now known and recoded in our microservice check list. It will not happen again.
How did we manage the communication ?
- The Customer Care was ready to answer actively the escalations.
- We updated proactively the Help Center minute by minute until the service was all back again.
Conclusions:
- We logged this kind of issues for next time, we learned and updated the process.
- The next updates will be seamless for users.
TIMELINE:Monday, November 26, 2018 - 12:35 CET
Please reload your Rainbow applications (Web, Desktop & Mobiles) to be able to use Rainbow properly. |
Monday, November 26, 2018 - 12:25 CET In addition, we recommend you to reload the Rainbow applications (Web, Desktop & Mobiles) to fixe some issues impacting your Bubbles (no content, white page etc.). |
Monday, November 26, 2018 - 12:10 CET
Following services are still degraded:
|
Monday, November 26, 2018 - 11:10 CET Rainbow is stable and users are back. Following services are still degraded:
|
Monday, November 26, 2018 - 10:50 CET Rainbow is back in service and users are able to connect again. The services are degraded and push is not working yet for mobile devices. |
Monday, November 26, 2018 - 10:30 CET During the Canada Data Center upgrade we face XMPP component disconnection. XMPP failed to connect to our database. We are working to roll back the release. |
Monday, November 26, 2018 - 9:50 CET We discover an outage during the planned data center Canada upgrade. We are investigating. We keep you updated. |
Commentaires
0 commentaire
Veuillez vous connecter pour laisser un commentaire.