How to protect against AWS IAM outages
Oct 24, 2025
•
Stijn De Haes & Jonny Daenen
AWS outage explained: what went wrong, why IAM failed, and how to protect your infrastructure next time.
In this special episode on the AWS outage, Stijn De Haes explains to us what happened during the AWS October 2025 Outage. He then zooms in on the limited effect it had on Dataminded and its product Conveyor. And finally he gives 4 tips on how to protect yourself from this kind of outage.  
👉 Link to the blog post: https://hubs.li/Q03PQKrR0   
💡 Key Takeaways:
- AWS “global” services can still have single-region dependencies. 
- Regionalizing and replicating your resources drastically improves resilience. 
- External dependencies multiply risk - replicate what matters most. 
- Preparation beats reaction - practice your outage response before it happens. 
🔥 What Happened:
- A major AWS outage in the US-East-1 (North Virginia) region affected global services like Slack, Outlook, Strava, Steam, and many others. 
- The root cause was a misconfiguration in AWS DNS affecting IAM and DynamoDB. 
- Over 140 AWS services were impacted — IAM, EC2, and DynamoDB among them. 
- IAM (Identity and Access Management) configuration was unavailable, preventing updates to roles and policies, but cached credentials allowed most running workloads to continue. 
- Despite AWS being “global,” many “global” services are still physically centralized in North Virginia, making it a single point of failure. 
🧠 How It Affected Dataminded & Conveyor:
- Conveyor (Dataminded’s data job scheduler and runner) detected the outage early via alerting systems. 
- Customer workloads mostly kept running due to prior resilience improvements: 
- IAM configured to use regional endpoints. 
- Container images replicated locally to regional ECR registries instead of public ECR. 
- Minor issues occurred due to one outdated dependency still pointing to public ECR. 
- Azure clusters were indirectly affected because Red Hat’s quay.io registry (K.io) is hosted on AWS. 
- No downtime for Conveyor’s control plane; the impact was limited to new deployments and image pulls. 
Want to elevate your Data Engineering skills?  
Join our upcoming Data Engineering Winter School 2026 in Leuven, running February 2–6, 2026. 
A 5-day, hands-on program designed and taught by seasoned Dataminded data engineers, combining practical workshops with online preparation videos.  
✅ Taught in English 
✅ 15% Early Bird Discount until December 30th 
✅ KMO-portefeuille eligible 
👉 Learn more and register: https://www.dataminded.com/academy
Latest
How to protect against AWS IAM outages
AWS outage explained: what went wrong, why IAM failed, and how to protect your infrastructure next time.
