Rainbow Services have experienced some troubles in NA & CALA on Wednesday, 1st July from 11:49AM UTC to 05:55 UTC. The disruption took place in two phases.
What Happened:
Incident Time frame:
- [Phase 1] From 11:49AM UTC to 12:30AM UTC: Backbone network issues in the NA datacenter of our provider impacting Rainbow users in NA & CALA. Our provider published two incidents impacting all hosted applications in the datacenter.
- From 12:40AM UTC to 12:55AM UTC: New network errors detected having the same impacts.
- From 12:55AM UTC to 03:58PM UTC: The entire Rainbow Operations team is working to restore Rainbow service while controlling possible overloads and traffic spikes when users reconnect.
- [Phase 2] From 05:08PM UTC to 05:50PM UTC: Our operations team again detected packet loss in our provider's network, resulting in degraded and slowed Rainbow service for LATAM and NAR Rainbow users.
- At 05:50PM UTC: Network is stable, no more errors detected. we verify that all Rainbow services are available and stable.
- At 05:56PM UTC: Incident is closed. All Rainbow services are WW available and stable.
Incident impact:
- [NA, CALA] From 11:49AM UTC to 12:55AM UTC: Full Rainbow Services (Connection, Messaging, Bubbles, Conferences, Telephony Services ...) down for all Rainbow users.
- [EMEA, NA, CALA] From 12:55AM UTC to 03:58PM UTC: Restoration of services (Web Call, Web Conference, Telephony services) in a phased manner. Users in EMEA may have been impacted when restoring services and some may have noticed empty bubbles or may not be able to create, join, delete a bubble or make a conference call
- [EMEA, NA, CALA] From 05:08PM UTC to 05:50PM UTC: Full Rainbow Services (Connection, Messaging, Bubbles, Conferences, Telephony Services ...) were degraded and slowed for NA & CALA users. Users in EMEA may have been impacted when restoring services and some may have noticed empty bubbles or may not be able to create, join, delete a bubble or make a conference call
Incident description:
Our datacenters provider faced a network issue in a NA datacenter resulting also network issues in CALA datacenter. This network issue resulted in server intermittent loss of internal network access. Despite high-availability is in place, secondary datacenter did not offload the traffic back
Once access to the data center was restored, our teams were obliged to control possible overloads and traffic spikes when users reconnect. Bubble traffic was handled despite the fact that some users could see empty bubbles for a short period of time. Rainbow's teams did everything possible to restore services and shorten the disruption time after the network outage.
Corrective Measures:
- Define an improvement plan with datacenter provider.
- Improve CALA backbone network resilience : already on-going.
- Make sure that the Rainbow does not load all the Bubbles at start-up to reduce the load on the infrastructure during a massive restart. Improvement already planned.
- Add statistics on the components linked to this outage to better understand their activity and optimize the restart times.
Communication History:
Wednesday, July 01, 2020 - 18:20 CEST Rainbow Services are fully restored since 18:00 CEST We monitored Rainbow infrastructure and we confirme that:
The Root Cause Analysis will be published in this article soon. |
Wednesday, July 01, 2020 - 16:20 CEST Restoration of services is still in progress. The following services are always degraded and more specifically for EMEA users:
We will inform you as soon as Rainbow Services are fully restored. |
Wednesday, July 01, 2020 - 15:30 CEST We are restarting the services and users should be able to connect and use Rainbow soon. Please note that several services can be provided in degraded mode (Telephony services, Conferences, etc.). We continue to work to restore full service. |
Wednesday, July 01, 2020 - 15:15 CEST Du to massive reconnections caused by this outage, users of other data centers (EMEA, APAC) may encounter some troubles and face empty Bubbles. |
Wednesday, July 01, 2020 - 15:10 CEST The network issue of our provider seems to be solved. Our entire team is focused on restoring services as soon as possible. |
Wednesday, July 01, 2020 - 14:45 CEST Our provider has confirmed an ongoing outage affecting our data centers in the NA & CALA regions. We work with him to restore services as soon as possible. |
Wednesday, July 01, 2020 - 14:15 CEST It seems that NA & CALA datacenter suffers from network errors due to our provider. Analysis are under progress. |
Wednesday, July 01, 2020 - 13:50 CEST Rainbow Services are currently unavailable for users connected to the CALA & NA datacenter. Do you have any doubts about the region you are concerned about? More details 🔽 Remember that the region of the Rainbow Company prevails, not the Rainbow user's region. |
Comentários
0 comentário
Por favor, entre para comentar.