How to protect against AWS IAM outages

Oct 24, 2025

Stijn De Haes & Jonny Daenen

AWS outage explained: what went wrong, why IAM failed, and how to protect your infrastructure next time.

In this special episode on the AWS outage, Stijn De Haes explains to us what happened during the AWS October 2025 Outage. He then zooms in on the limited effect it had on Dataminded and its product Conveyor. And finally he gives 4 tips on how to protect yourself from this kind of outage.


👉 Link to the blog post: https://hubs.li/Q03PQKrR0

💡 Key Takeaways:

  • AWS “global” services can still have single-region dependencies.

  • Regionalizing and replicating your resources drastically improves resilience.

  • External dependencies multiply risk - replicate what matters most.

  • Preparation beats reaction - practice your outage response before it happens.

🔥 What Happened:

  • A major AWS outage in the US-East-1 (North Virginia) region affected global services like Slack, Outlook, Strava, Steam, and many others.

  • The root cause was a misconfiguration in AWS DNS affecting IAM and DynamoDB.

  • Over 140 AWS services were impacted — IAM, EC2, and DynamoDB among them.

  • IAM (Identity and Access Management) configuration was unavailable, preventing updates to roles and policies, but cached credentials allowed most running workloads to continue.

  • Despite AWS being “global,” many “global” services are still physically centralized in North Virginia, making it a single point of failure.

🧠 How It Affected Dataminded & Conveyor:

  • Conveyor (Dataminded’s data job scheduler and runner) detected the outage early via alerting systems.

  • Customer workloads mostly kept running due to prior resilience improvements:

  • IAM configured to use regional endpoints.

  • Container images replicated locally to regional ECR registries instead of public ECR.

  • Minor issues occurred due to one outdated dependency still pointing to public ECR.

  • Azure clusters were indirectly affected because Red Hat’s quay.io registry (K.io) is hosted on AWS.

  • No downtime for Conveyor’s control plane; the impact was limited to new deployments and image pulls.

Want to elevate your Data Engineering skills?
Join our upcoming Data Engineering Winter School 2026 in Leuven, running February 2–6, 2026.
A 5-day, hands-on program designed and taught by seasoned Dataminded data engineers, combining practical workshops with online preparation videos.
✅ Taught in English
✅ 15% Early Bird Discount until December 30th
✅ KMO-portefeuille eligible
👉 Learn more and register: https://www.dataminded.com/academy

Latest

How to protect against AWS IAM outages

AWS outage explained: what went wrong, why IAM failed, and how to protect your infrastructure next time.

Leave your email address to subscribe to the Dataminded newsletter

Leave your email address to subscribe to the Dataminded newsletter

Leave your email address to subscribe to the Dataminded newsletter

Belgium

Vismarkt 17, 3000 Leuven - HQ
Borsbeeksebrug 34, 2600 Antwerpen


Vat. BE.0667.976.246

Germany

Spaces Tower One,
Brüsseler Strasse 1-3, Frankfurt 60327, Germany

© 2025 Dataminded. All rights reserved.


Belgium

Vismarkt 17, 3000 Leuven - HQ
Borsbeeksebrug 34, 2600 Antwerpen

Vat. BE.0667.976.246

Germany

Spaces Tower One, Brüsseler Strasse 1-3, Frankfurt 60327, Germany

© 2025 Dataminded. All rights reserved.


Belgium

Vismarkt 17, 3000 Leuven - HQ
Borsbeeksebrug 34, 2600 Antwerpen

Vat. BE.0667.976.246

Germany

Spaces Tower One, Brüsseler Strasse 1-3, Frankfurt 60327, Germany

© 2025 Dataminded. All rights reserved.