AWS us-east-1 Outage Disrupted Major Apps; Amazon Cites DNS Issue With DynamoDB (Now Resolved)

AWS us-east-1 Outage Disrupted Major Apps; Amazon Cites DNS Issue With DynamoDB (Now Resolved)

Cloud data center and network cables

A widespread incident in Amazon Web Services’ us-east-1 region on October 20 (ET) caused increased error rates and latency across multiple services, temporarily disrupting major apps including Snapchat, Venmo, Lyft, Fortnite—and even Amazon’s own Alexa. AWS attributed the trigger to DNS resolution issues affecting DynamoDB endpoints. Amazon later said services had returned to normal operations, with backlogs clearing through the afternoon and evening.

Timeline highlights

  • 3:11 AM ET: AWS reports elevated errors/latencies for multiple services in us-east-1.
  • ~5:01 AM ET: Root cause identified: DNS resolution issue for DynamoDB APIs; mitigations begin.
  • 6:35 AM ET: DNS issue mitigated; residual impacts persist (notably with new EC2 instance launches).
  • 8:48–10:14 AM ET: Continued progress; AWS rate-limits new EC2 instance launches to aid recovery.
  • 3:01 PM ET: AWS reports services back to normal operations; request backlogs processing.
  • Evening: Amazon confirms resolution of widespread errors and latencies.

What happened and why it mattered

us-east-1 is one of AWS’s most heavily used regions. DNS failures prevented clients from reliably reaching DynamoDB endpoints, effectively separating many applications from their data/control planes and causing cascading issues across dependent services. Outages in a single hyperscale region can ripple across large portions of the internet.

Services impacted (reported by users and status pages)

  • Alexa voice requests
  • Snapchat, Venmo, Lyft
  • Fortnite, Roblox
  • Various media and streaming apps (e.g., Disney+), and numerous websites

Builder takeaways

  • Evaluate multi-region architectures where RTO/RPO require regional resilience; avoid single-region dependencies for critical paths.
  • Avoid hard-coding deployments to specific Availability Zones to maximize failover flexibility.
  • Harden clients with exponential backoff, circuit breakers, and graceful degradation patterns.
  • Review DNS/resolver caching strategies and practice DR/chaos drills regularly.

Additional context

  • AWS held roughly 30% of global cloud infrastructure market share as of mid-2025.
  • Knock-on effects included issues launching new EC2 instances during recovery, which AWS rate-limited to stabilize the region.

References: Check the AWS Service Health Dashboard and AWS Health Status for official updates.

Discussion: Will this outage spur wider adoption of multi-region or multi-cloud strategies, or do cost and complexity still outweigh the risk for most teams?

Leave a Reply

Your email address will not be published. Required fields are marked *

Diese Seite verwendet Cookies, um die Nutzerfreundlichkeit zu verbessern. Mit der weiteren Verwendung stimmst du dem zu.

Datenschutzerklärung