An AWS service outage caused widespread digital disruption for millions of users and over one thousand companies starting in the early hours of Monday, 20th October. And though the problem was resolved within 24 hours, the incident, once more, highlights the fragility of the many apps and businesses that rely on Big Tech infrastructure providers.
The AWS outage originated from a DNS issue at AWS’s US-East-1 region which had a knock-on effect across global networks. In the UK, this meant disruption to many UK institutions including HMRC, leading banks including Barclays and Lloyds, and many retailers, as well as disruption of apps including SnapChat and some gaming apps.
Access deeper industry intelligence
Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.
Calls for more planning and investment in UK sovereign digital infrastructure as a hedge against further risk of disruption appear to run counter to current government policy. The US UK prosperity deal, announced during President Trump’s UK state visit last month, heralded a raft of huge US AI infrastructure investments into the UK.
As dependence on this US owned digital infrastructure deepens, more frequent and large-scale outages become more likely. It was only a little over a year ago in July 2024, that a faulty CrowdStrike update crashed over 8.5 million Microsoft systems causing an estimated $10bn in damages. So, how can businesses mitigate the risk of an outage such as the most recent AWS incident?
Disaster recovery planning is key
Steven Schuchart, principal analyst for enterprise networking at GlobalData warns that businesses need to get serious about their business continuity and disaster recovery plans. “Amazon has done an amazing job of staying available over the years, but they are not infallible, no organisation is. A big part of those BC/DR plans needs to be actual failover testing, and regular review of how the organisation responds to an event,” he says.
Dai Vaughan, CTO at Public Digital says that resilience must be treated as a whole-organisation challenge, not just an IT problem: “This requires collaboration across departments and partners, especially as supply chains become more interconnected and complex.”
US Tariffs are shifting - will you react or anticipate?
Don’t let policy changes catch you off guard. Stay proactive with real-time data and expert analysis.
By GlobalData“One thing all organisations should do to prepare is to create a designated crisis response team. This should be fewer than 12 people and include those with expertise in IT, data management, communications and stakeholder management, as well as senior leadership,” says Vaughan.
“Ultimately, resilience isn’t about eliminating risk entirely, but about understanding it, planning for it, and cultivating a culture that can absorb shocks and recover quickly,” he adds.
Tim Wright, tech partner at law firm Fladgate, agrees that resiliency is not purely a technical parameter but also a regulatory and contractual one: “Firms must reassess their agreements with specific focus on cloud exit, redundancy and incident notification contractual clauses through that lens.”
Wright notes that for regulated entities, especially in financial services, the UK’s Critical Third Parties (CTP) regime — now in force under the Financial Services and Markets Act 2024 and applied through the PRA and FCA’s operational resilience framework — will inevitably come into sharper focus after the AWS incident.
“Supervisors may require stress testing and post incident audits to ensure that firms maintain visibility and contractual leverage over their cloud dependencies,” he adds.
A concentration of risk
AWS makes up about a third of the global cloud infrastructure market with AI driven applications hosted on this infrastructure increasing the concentration of risk. Rob van Lubek, EMEA vice president at Dynatrace says: “As our reliance on technology grows and AI continues to reshape how we operate, maintaining that visibility across complex digital ecosystems will be essential. The organisations best prepared for the future will be those that can see across their entire environment, anticipate risks, and adapt quickly when the unexpected happens.”
“For large enterprises especially, the difference between disruption and recovery often comes down to visibility and speed – how fast an organisation can pinpoint what’s gone wrong, understand why, and act to restore service continuity. That level of digital resilience requires deep insight into how systems connect and where vulnerabilities might emerge, so teams can focus on what truly matters in a crisis,” says Lubeck.
