On Thursday 16th of July 2020 our infrastructure was impacted by what was eventually diagnosed as a Distributed Denial of Service (DDoS) attack causing partial or total unavailability of our
osc-secnum-fr1 regions. The attack impacted both the Scalingo platform itself as well as the hosted applications of our customers. This post details the course of events, analyses the reaction of the team, and details which actions will be taken to improve the situation in the future.
All timestamps are in Central European Summer Time (CEST).
osc-secnum-fr1regions are not responding to any type of requests done by our operators. Our infrastructure provider Outscale is contacted.
osc-fr1: network access is unavailable for the second time.
*.osc-fr1.scalingo.ioor domains having configured a
CNAMEDNS field. Users using a A field are encouraged to come to us on the support chat to get the new IP to use.
The cumulated total interruption duration of this incident was 1h53, and a partial interruption of 1h34.
The incident was the first massive DDoS endured by Scalingo. We were attacked in the past but not to that extent. A DDoS attack effect is that it fills completely the networking pipes with forged content and it's preventing legitimate traffic to reach its destination.
Thousands of IPs from all around the world were targeting our infrastructure with a flood of requests: 8Gbps of traffic and approximately 1,000,000 connections per second were attempted by the attacker. The attack was done on the port 443 (HTTPS), and was generating HTTPS requests with extremely large headers to amplify the volume of data sent to our platform. We assume it was a rented botnet, a large set of infected devices controlled remotely. According to Cloudflare statistics, in 2019, 92% of DDoS attacks were under 10Gbps, so this attack was in the higher part of common DDoS attacks.
The target of the attack was a specific website hosted by our services. It has not been determined why their website was the target of such an attack. It does not store any sensitive information, nor anything valuable which could usually be the target of such type of attacks.
Having multiple IP addresses to serve our infrastructure did not help at first since the whole Internet bandwidth from the data-center was filled with attacker requests, blocking access to all the network whatever was the target IP. Our services were impacted as well as other entities sharing the same infrastructure. Having multiple IP addresses helped us mitigating the attack, modifying the way they were routed individually.
As listed in the Timeline of the event, 4 attacking waves happened. During these events, the following symptoms appeared:
Our status page https://scalingostatus.com was being updated regularly during the day.
We've answered to all messages coming through Intercom either via the in-app chat, or through our support email email@example.com.
Our Twitter account @ScalingoHQ posted about the major parts of the incident.
Specific information has been pushed personally to some customers or to people who asked.
During the incidents, several actions were attempted to mitigate the impact of the attack on Outscale infrastructure and on Scalingo customers.
First, we want to thank our customers which have been very understanding and encouraging during the incident.
We're fully conscious that such incident has an important impact for all of you. That's why our team is handling an on-call rotation 24/7 managing the infrastructure so you don't have to.
Incidents happen and its our role to handle them as good as possible and to be prepared for them.
DDoS attacks are always difficult to apprehend due to their nature, dummy attack just filling any possible capacity which is available. We wrote procedures to handle them but the scale of this attack was huge.
We had great contact with the Outscale team. We fill confident in our choice of partner here, they helped us getting back on our feet thanks to their network of partnerships.
We're closely monitoring the implementation of the anti-DDoS feature proposed by Outscale to protect your applications from these kind of attacks in the future.
We're fully aware that the downtime which occurred July 16th has heavily impacted this engagement.
Therefore all Business customers will automatically get a financial compensation of 10% on their invoice for the month of July (5% per hour of downtime).
To qualify as a Business user you must own at least one application with a database using a Business plan and at least 2 containers serving web or TCP traffic to your app.