An outage at Amazon Web Services (AWS), the cloud services provider that underpins much of online platforms, took nearly the entire Internet down for several hours on Monday, underscoring the fragility of companies that use cloud-based servers to host their data and how suddenly an unplanned outage can affect businesses around the globe.
AWS is by far the leader in terms of revenue in cloud services, and leads rival offerings from Microsoft and Google. More than 1,000 services were impacted by the outage, including popular platforms like WhatsApp, Snapchat and Reddit, which rely on AWS services, along with financial institutions like the British government’s tax services and entertainment services. This has led to experts calling for diversification in cloud computing.
Last year, major disruption in Microsoft Corp’s cloud services caused service disruptions to a number of businesses around the world, including in India.
AWS provides cloud computing services, including storage, compute power, and databases, which allows companies to rent IT infrastructure from Amazon’s global network of data centres instead of buying and managing their own physical servers. In the first half of the year, AWS accounted for nearly 20% of Amazon’s sales, but about 60% of its operating profit.
What caused the AWS outage
The issues appear to have begun on Monday morning, as users began to report problems accessing a number of online platforms.
While Amazon has not yet fully detailed what exactly went wrong, on the health page of AWS, the company said it “experienced increased error rates and latencies” for its services in its key data centre region in North Virginia in the United States. “… we identified the trigger of the event as DNS resolution issues for the regional DynamoDB service endpoints,” it added.
It appears that at the heart of the outage was a DNS (Domain Name System) issue in AWS’s DynamoDB service endpoints. A DNS issue is one of the more common issues to occur in Internet infrastructure. A DNS issue is a problem where a computer cannot connect to a website because it fails to translate the human-readable domain name (like example.com) into its corresponding numerical IP address. This prevents the browser from finding the correct server, resulting in symptoms like slow loading times, inability to access a site, or an error message.
Story continues below this ad
DynamoDB is used for building scalable, high-performance applications by providing a fully managed, serverless NoSQL database service. A NoSQL database, often referred to as “not only SQL,” is a non-relational database designed to handle diverse and flexible data structures. Unlike traditional relational databases (SQL databases) that store data in rigid, tabular formats with predefined schemas, NoSQL databases offer more adaptable storage models.
The fragility of the Internet infrastructure
Even as the Internet hosts billions of applications and services, a majority of them are underpinned by cloud services offered by AWS, Microsoft, or Google. For long, experts around the world have pointed out the over-reliance that companies globally have on these service providers, since a small issue at their end can impact vast swathes of the Internet, as was evident in Monday’s outage.
This is a relatively new phenomenon as some years ago businesses used to host and manage their own cloud services. However, outsourcing it to one of the big three companies is cheaper and more efficient. And while there is an argument to be made that such outages on the big three cloud services providers aren’t nearly as common or frequent, the over-reliance on their services means that when disruptions do happen, it takes down a large part of the web.
For instance, last year, thousands of businesses across multiple geographies, spanning sectors such as aviation, banking and broadcasting, faced a severe service outage due to a faulty code update on CrowdStrike, a service deeply embedded within the Microsoft architecture. We have written about this here.
Story continues below this ad
In India, the impact of the outage was most pronounced in the aviation sector, with hundreds of flights delayed and several cancelled as airline operators found their systems inoperational and had to switch to manual processes. At least ten banks and NBFCs had “minor disruptions”, which have either been resolved or are being resolved, the Reserve Bank of India said at the time.