Rainbow Services have experienced some troubles on Wednesday, October 13, 2020 from 09:20AM CEST to 14:15PM CEST.
What Happened:
Incident description:
An incident occurred during a maintenance operation done by our infrastructure (IaaS) providers. This incident resulted a problem in the whole backbone.
This outage resulted in a global unavailability of Rainbow services as data centers hosted by other infrastructure (IaaS) providers were unable to operate independently.
Incident Time frame:
- From 09:20AM CEST to 09:59AM CEST: All Rainbow services are unavailable. A network incident at an infrastructure provider (IaaS) is suspected by the operations team, which focuses on this incident.
- From 10:00AM CEST to 10:20AM CEST: One of our infrastructure (IaaS) providers confirms a network incident impacting all of its data centers.
- From 10:21AM CEST to 10:40AM CEST: The network incident impacting the infrastructure provider (IaaS) is over and the data centers start to reconnect. The first Rainbow services are back online. However, several monitoring tools or management consoles are still unavailable due to the network outage. This increases the restart time of the services.
- From 10:41AM CEST to 11:25AM CEST: Almost all Rainbow services restart and are available. A general slowness remains perceptible during the use of the solution.
- From 11:26AM CEST to 01:30PM CEST: All PBXs (OXO/OXE) reconnect to the Rainbow infrastructure. Hybrid Telephone services are available again.
- From 01:31PM CEST to 02:20PM CEST: The last third party PBX reconnects and a monitoring period confirms the end of the incident.
Incident impact:
- From 09:20AM CEST to 10:21AM CEST:
- All Rainbow services are unavailable.
- From 10:22AM CEST to 10:40AM CEST:
- The first services are back but it is still impossible to connect to the application.
- From 10:41AM CEST to 11:25AM CEST:
- All services are back with the exception of Hybrid Telephony services. Some slowness are noticeable when using the solution.
- From 11:26AM CEST to 01:30PM CEST:
- Hybrid Telephony services (OXE & OXO PBX) are progressively available.
- At 02:15PM CEST:
- All Rainbow services are back.
- From 09:20AM CEST to 10:21AM CEST:
- All Rainbow services are unavailable.
- At 10:21AM CEST:
- All Rainbow services are back including Telephony services.
Corrective Measures:
- Optimize the restoration time of the Rainbow solution and in particular those related to the Hybrid Telephony services.
-
Study the opportunity to further increase the redundancy of the infrastructure by adding to the spatial redundancy (already operational) a multi providers redundancy for infrastructure.
COMMUNICATION HISTORY:
Wednesday, October 13, 2020 - 14:15 CEST The last PBXs reconnected at 2:15pm and our monitoring has not revealed any anomalies since then. We consider the incident closed and will publish a root cause analysis soon. Thank you to everyone who was patient today while our teams worked diligently to restore Rainbow. |
Wednesday, October 13, 2020 - 13:30 CEST The reconnection of the PBXs has taken longer than expected but is nearing completion. The vast majority of them are now operational. We continue to monitor all Rainbow services and can say that the incident is nearing its end. |
Wednesday, October 13, 2020 - 12:10 CEST The slowness is disappearing. The last service really impacted is Hybrid telephony. All PBXs are reconnecting which creates a "queue". We are doing everything necessary to accelerate these reconnections and allow you to recover all your services. |
Wednesday, October 13, 2020 - 11:35 CEST All users have been able to log in and use Rainbow services since 11:31 a.m. All services are available with the exception of Hybrid telephony which requires a little more time for each PBX to reconnect. Some slowness may occur in the use of the application because some operations are still in progress. |
Wednesday, October 13, 2020 - 11:20 CEST We are doing our best to restore all the servers as soon as possible. Here is the latest information:
|
Wednesday, October 13, 2020 - 11:00 CEST Some basic services are back. Users connected on android can use instant messaging and Bubbles. We will provide you with the list of available services as they become available. All services on the HDS environment are available since 10:22 am. |
Wednesday, October 13, 2020 - 10:45 CEST Good news! We confirm that Rainbow services are being restored. We are working hard to get you back to using all Rainbow features as soon as possible. |
Wednesday, October 13, 2020 - 10:30 CEST The situation is starting to recover at our infrastructure provider (IaaS) and we have started to restore Rainbow services. The restoration of services is a gradual process and you may experience troubles until the restoration is complete. |
Wednesday, October 13, 2020 - 10:20 CEST The situation is still recovering at our infrastructure provider (IaaS). We will be able to start restoring Rainbow services as soon as the situation is stable on their side. Rainbow services on the HDS environment are still available but may operate in a degraded manner. |
Wednesday, October 13, 2020 - 10:00 CEST We confirm that the incident is related to our infrastructure provider (IaaS). This global outage has an impact on a large number of sites or applications. Rainbow services on HDS environment have just been restored but may work in a degraded way. |
Wednesday, October 13, 2020 - 09:45 CEST A first analysis seems to indicate an issue with our infrastructure provider (IaaS). We are working to identify the root cause in order to restore Rainbow services as soon as possible. |
Wednesday, October 13, 2020 - 09:30 CEST Rainbow services are currently experiencing troubles. We are currently analyzing the situation and will get back to you shortly. |
Comments
21 comments
OVH is down.
do we need to restart the webRTC ?
link with hybrid PBX is offline
Hello Jeroen,
Many PBXs are reconnecting simultaneously. This is probably why your services are not yet fully back online.
You can indeed restart your WebRTC Gateway to help it reconnect faster.
The problem seems to be the pbx agent not connected, as i can see webrtc connected for all companies in equipment section
but restarting the pbx agent doesnt do anything yet
after reboot still the same, clients can not call.
mpcheck is all ok/green
what else can we do to force reconnect?
effect in rainbow app:
same on mobile phones
Yes - Same is happening for us with one of our customers ACA (AsiaPasific) - Telephony services yet to be restored even after restarting the WebRTC and RainbowAgent Restart has no response as said by Dallan
WebRTC status is not connected
Restarting the agent may help as it sends a new request to the Rainbow server.
However, as noted in the 12:10 update, the large number of systems attempting to reconnect simultaneously creates a queue.
We are working to resolve this queue as quickly as possible.
Quentin,
any update as still 0 of our customers can use the rainbow telephony service.
Rebooting the rainbow agent in the pabx doesn't help.
Hi,
we see PBXs connecting slowly and there is latency with a large queue.... We are currently working to optimize this time and decrease reconnexion times
still none of our tele-workers can call over rainbow since this morning +/-9h
Pascal,
OXE are back connected but 0 OXO's this time.
When you start the queue of the OXO & OCE ?
it is started.... I will check if some others OXO are connected
OXO & OCE are starting 1 by 1 to connected to the services now
there was just a new hickup in the connection between webRTC and the rainbow service?
But after 5min it was back online
me too, it was offline for few minutes.
Any update on what happend last week?
How to prevent in the future?
Official detailed communication from Alcatel for such serious outage?
Hello Jeroen,
The root cause analysis is available since the last update of the article late Friday morning. If you are subscribed to the section, you should have received an email notification as soon as it was published.
I hope the above is useful to you.
Dear Quentin,
Do you mean only this block?
Is this what you mean with enough communication for an serious outage?
The corrective measures are based on the complete analysis available above. We are sorry if this information is not sufficient for you.
We recommend that you contact your Rainbow reseller or Alcatel sales representative for more information.
Please sign in to leave a comment.