A massive internet outage that made swathes of websites inaccessible for an hour has been traced back to cloud computing services provider Fastly. But what caused the Fastly outage and why did so many websites, including Reddit, Amazon and GitHub, go down?
Users attempting to visit affected sites from approximately 09:58 am UTC on 8 June were met with a 503 error or connection failure message.
At 10:57am Fastly applied a fix and news websites including Verdict, the Financial Times and the Guardian, along with the UK government’s website, became accessible once again.
Fastly said the outage stemmed from a problem with its content delivery network (CDN), a distributed network of data centres designed to speed up page loading times by reducing the distance that data packets have to travel between the server storing a web page and the user opening it.
But because Fastly acts as an intermediary between clients and their users, a major outage to their services prevents data packets from reaching a person’s device, rendering the website inaccessible.
The San Francisco-headquartered company provides an “edge cloud platform” that aims to speed this process up even further. However, the Fastly status page showed “degraded performance” across all its servers globally, from North America to South Africa.
“We identified a service configuration that triggered disruptions across our POPs [point of presences] globally and have disabled that configuration. Our global network is coming back online,” Fastly wrote in an update.
In networking, a POP is an interface such as an access point where two or more networks share a connection. It is typically located inside a data centre. It suggests a software configuration applied to all of its data centres inadvertendly stopped web content from being delivered from client to user.
“Fastly edge platform is having problems, which means a big part of the internet is having problems,” wrote F-Secure’s Mikko Hypponen. “This includes Twitter. Even fastly.com itself is unavailable in many locations. Basically, internet is down.”
Move Fastly and break things
The outage highlights how the decentralised internet is now maintained by a handful of companies that can bring down large parts of the internet with them when they run into technical problems.
Amazon Web Services, the world’s biggest cloud company, experienced an outage in 2017 that knocked some of the world’s biggest websites offline along the US east coast.
“It is remarkable that within ten minutes, one outage can send the world into chaos,” said Mark Rodbert, CEO of Idax. “This demonstrates the extent to which the move to the cloud has changed the things that companies need to protect.”
Jake Moore, cybersecurity specialist at ESET, said the Fastly outage underscores “the importance and significance of these vast hosting companies”.
The ability to access certain sites varied by location, which Mark Hendry, director of data protection and cybersecurity at DWF, said is because of the way algorithms direct requests for content.
“For instance, the algorithm might direct the traffic so that it routes through the most available or highest performing node, or so that the traffic takes the fastest network route to the requestor,” he explained.
At 12:41pm Fastly said it “observed recovery of all services and has resolved this incident. Customers could continue to experience a period of increased origin load and lower Cache Hit Ratio.”