Rainbow Services have experienced some troubles Thursday, July 08, 2021 from 11:45AM CEST to 06:10PM CEST.
Some database migration process, required for the launch of future features have shown different behavior between the testing platform and the production environment due to the traffic level between the two.
This resulted in a unexpected increase of traffic in one of our datacenter. However that has impacted the Rainbow users experiences in some Region: Latencies on the usual experiences, difficulties to connect, loss of contacts and calls.
This incident is a combination of different things that are normally unlikely to happen at the same time. Therefore this incident remains very unusual.
Incident Time frame:
- From 11:45AM CEST to 11:50AM CEST: A slowness in the datacenter in the EMEA Region was detected by the operation team.
- From 11:11AM CEST to 05:54PM CEST: Several operations were conducted to find the source of the congestion. Some features were prioritized over others to find a balance and end the congestion.
This has mitigated the disruptions but some service access failures still affect a portion of the user base.
- From 05:55PM CEST to 06:10PM CEST: The different settings have allowed us to end the congestion. All the functionalities are available and a monitoring of the infrastructure remains active.
Remember that the region of the Rainbow Company prevails, not the Rainbow user's region.
- From 11:45AM CEST to 11:00AM CEST:
- Slowness in the application provides a degraded level of service for a limited number of users.
- From 11:01AM CEST to 05:54PM CET:
- All features can be affected by this slowness, randomly for a limited number of users.
- Improve the testing environment in high traffic conditions to better detect such side effects and avoid them.
- Continuously improve our communication process.
The communication was managed through the site status.openrainbow.com: