AWS US‑EAST‑1 Outage: DNS/DynamoDB Failure Causes Widespread Downtime
On a crisp October morning, a significant outage in Amazon Web Services’ US‑EAST‑1 region (Northern Virginia) caused widespread disruptions across websites, apps, games and services that depend on AWS. The issue — traced to DNS resolution problems with the DynamoDB API — led to many services losing access to stored data for several hours.
Key timeline highlights:
- ~3:11 AM ET: AWS reported increased error rates and latencies in US‑EAST‑1.
- ~5:01 AM ET: AWS identified a DNS resolution issue affecting the DynamoDB API.
- ~6:35 AM ET: AWS said the DNS issue was fully mitigated and most services were succeeding normally.
- Later updates: EC2 instance launches remained affected; AWS recommended avoiding tying new deployments to specific Availability Zones to give EC2 flexibility while recovery continued.
Because so many companies rely on US‑EAST‑1, the outage produced knock‑on effects across numerous services. Users reported problems with banks, airlines, streaming platforms, social apps and games — including Disney+, Snapchat, Reddit, Lyft, Apple Music, Pinterest, Fortnite, Roblox and The New York Times. Outage reports spiked on monitoring sites as services showed errors or sluggish performance.
As University of Notre Dame professor Mike Chapple put it, “Amazon had the data safely stored, but nobody else could find it for several hours, leaving apps temporarily separated from their data.” The incident underlines a broader risk: heavy reliance on a few cloud providers — AWS held roughly 30% of the global cloud infrastructure market as of mid‑2025 — can make large parts of the internet vulnerable when those providers experience faults.
What AWS did and recommended:
- AWS applied multiple mitigations across Availability Zones in US‑EAST‑1 and implemented rate limiting on new EC2 launches to aid recovery.
- It advised clients to avoid binding new deployments to a single Availability Zone so EC2 could choose zones with capacity.
- AWS warned that even after service restoration, a backlog of requests and other factors would prolong full recovery.
For the latest official updates, check the AWS Service Health Dashboard: status.aws.amazon.com. A contemporaneous news summary is available at Engadget: Engadget report.
Why this matters: Many businesses choose large cloud providers for scalability and global reach, but incidents like this highlight the importance of cross‑region redundancy, multi‑cloud strategies or robust failover plans — especially for critical services.
Discussion: Did you notice any downtime during this outage? How do you think companies should architect systems to reduce the blast radius of cloud provider failures?
