Rainbow Services have experienced some troubles from Monday, March 30, 2020 - 17:05 CEST to Monday, March 30, 2020 - 18:45 CEST.
What Happened:
Incident Time frame:
- From 05:05PM CEST to 06:05PM CEST: Global outage in one datacenter of our provider impacting Rainbow WW users.
- From 06:05PM CEST to 06:45PM CEST: Rainbow Services restoration by our team.
- At 07:45PM CEST: Stability confirmed after one hour monitoring.
Incident impact:
- From 05:05PM CEST to 06:05PM CEST: Full Rainbow Services (Connection, Messaging, Bubbles, Conferences, Telephony Services ...) down for EMEA users and degraded Rainbow services for rest of the world.
- From 06:05PM CEST to 06:45PM CEST: Restoration of services in a phased manner. Bubbles and Web Conferences came back at 06:45PM CEST.
Incident description:
Our datacenters provider faced a network core equipment defect in an EMEA datacenter. Despite high-availability is in place, secondary hardware did not offload the traffic back. Our provider is working with his network equipment provider to fine tune the root cause.
Our secondary active EMEA datacenter was not able to fully handle the traffic for the below reasons:
- Network was flapping and outage was also impacting our secondary active EMEA datacenter (reported as a Global network outage by our provider) at some point during the event.
- Some components did not support the disconnection and the nodes were in a hang state.
Why the impact was global outage worldwide:
- The current design makes some local components dependent on EMEA components. A service disruption on EMEA datacenters may generate a worldwide degraded mode.
- Same goes for PBX telephony services, which are currently, by design, all located in EMEA datacenters.
Why the worldwide service restarted at 06:15PM CET and an issue was still existing in EMEA (Bubbles and Web Conferences):
- Some Bubbles and Web Conference components have been restarted and it tooks 25/30 minutes to the servers to flush the traffic incoming on the event queue, used by Rainbow “Bubbles”.
Corrective Measures:
- Actions to enforce resilience to network outages are on going.
- Restoration time once the network is back is going to be enhanced (today between 30 to 45 minutes).
Communication History:
Monday, March 30, 2020 - 19:45 CEST Rainbow Services have been fully restored since 18:50 CEST We monitored Rainbow infrastructure and we confirme that all Services are operational since 18:50 CEST - RCA to come. |
Monday, March 30, 2020 - 18:45 CEST All Rainbow Services are now operational. We continue to monitor the infrastructure and verify the stability. We will inform you as soon as the outage is close. |
Monday, March 30, 2020 - 18:40 CEST Users are currently retrieving their Telephone features. These features are already available for most Rainbow users. Following services are still degraded:
We will inform you as soon as Rainbow Services are fully restored. |
Monday, March 30, 2020 - 18:30 CEST We are restarting the services and users should be able to connect and use Rainbow services soon. Please note that several services can be provided in degraded mode (Telephony services, Conferences, etc.). We continue to work to restore full service. |
Monday, March 30, 2020 - 18:20 CEST We confirm that the source of the failure is due to a network problem at our provider. We work with him to restore services as soon as possible. |
Monday, March 30, 2020 - 17:50 CEST [Update]: Rainbow services are currently not available or are operating in a degraded mode to all users, whatever the region. Our entire team is focused on restoring services as soon as possible. |
Monday, March 30, 2020 - 17:30 CEST It seems that EMEA datacenter suffers from network errors due to our provider. Analysis are under progress. |
Monday, March 30, 2020 - 17:10 CEST Rainbow Services are currently unavailable for users connected to the EMEA datacenter. Do you have any doubts about the region you are concerned about? More details 🔽 Remember that the region of the Rainbow Company prevails, notb the Rainbow user's region. |
Comments
4 comments
Can you confirm the teim expected for a fix?? I have numerous resellers who use this solution, which is critical to there business.
Unfortunately, we cannot yet announce any deadlines for a return to normal.
We understand what is at stake for our customers and are doing everything we can to resolve this incident as quickly as possible.
Thanks for your understanding,
Many thanks for the timely updates.
Let's hope that it continues to run smoothly.
Thanks for your good job on this issue :-)
Please sign in to leave a comment.