chaoslab.eknathalabs.com · built in public

Kubernetes
Chaos Simulator

Inject real failures into your cluster. Learn resilience engineering by intentionally breaking things in a safe, controlled environment — powered entirely by GitHub Actions.

4Fault types
100%GitHub native
AutoRollback
FreeAlways
First-time setup required. Add KUBECONFIG_DATA and CLUSTER_CONTEXT as GitHub Secrets. Then set CONFIG.REPO and CONFIG.TOKEN inside chaos.js.  README → Setup ↗
01 — choose fault
💀live
Pod Kill
Randomly deletes matching pods. Tests ReplicaSet self-healing and restart policies.
⏱ 10–600s🎯 label selector
🔥live
CPU Stress
Saturates CPU with stress pods. Tests HPA autoscaling and resource limit enforcement.
⏱ 10–600s⚙️ workers
🌐live
Network Delay
Injects latency via tc netem. Tests timeouts and circuit breakers.
⏱ 10–600s📡 ms delay
🖥️live
Node Drain
Cordons and drains a worker node. Tests PodDisruptionBudgets and autoscaler response.
⏱ 30–600s🔄 auto uncordon
02 — configure experiment
$ chaos run --fault pod-kill --dry-run
🔵 Dry run ON — safe
kube-system is always blocked
Kubernetes label selector syntax
Time before auto-rollback check
Max pods affected
Actions ↗
03 — experiment log
chaoslab — stdout idle
--:--:--ChaosLab ready. Select a fault, configure your experiment, then click Run.
--:--:--Dry run is ON by default — safe to click without affecting your cluster.
04 — recent runs
No experiments yet. Run your first chaos experiment above.
05 — safety guardrails
🛡️
kube-system blocked
Hardcoded protection — no experiment can target the kube-system namespace under any condition.
🎯
Blast radius cap
Maximum 30% of matching pods by default. Configurable, but never more than you explicitly allow.
🔁
Auto rollback
Every workflow has an always: cleanup block. Chaos artefacts are removed on success or failure.
🔵
Dry run default
Dry run is ON by default. Every experiment shows exactly what it will do before doing anything.
📋
Full audit trail
Every run is logged in GitHub Actions history. JSON reports saved as artifacts for 30 days.
Emergency rollback
One-click rollback cleans all chaos artefacts, uncordons nodes, and restarts deployments instantly.