This content originally appeared on DEV Community and was authored by Shardul Srivastava
With Cloud, distributed architectures have grown even more complex and with complexity comes the uncertainty in how the system could fail.
Chaos Engineering aims to test system resiliency by injecting faults to identify weaknesses before they cause massive outages such as improper fallback settings for a service, cascading failures due to a single point of failure, or retry storms due to misconfigured timeouts.
History
Chaos Engineering started at Netflix back in 2010 when Netflix moved from on-prem servers to AWS infrastructure to test the resiliency of their infrastructure.
In 2012, Netflix open-sourced ChaosMonkey under Apache 2.0 license as a tool to test the resilience of your application infrastructure.
Cloud Native Chaos Engineering in CNCF Landscape
CNCF focuses on Cloud Native Chaos Engineering defined as engineering practices focused on (and built on) Kubernetes environments, applications, microservices, and infrastructure.
Cloud Native Chaos Engineering has 4 core principles:
- Open source
- CRDs for Chaos Management
- Extensible and pluggable
- Broad Community adoption
CNCF has two sandbox projects for Cloud Native Chaos Engineering
Chaos Mesh
Chaos Mesh is a cloud-native Chaos Engineering platform that orchestrates chaos on Kubernetes environments. It is based on Kubernetes Operator pattern and provides a Chaos Operator to inject into the applications and Kubernetes infrastructure in a manageable way.
Chaos Operator uses Custom Resource Defition(CRD) to define chaos objects. It provides a variety of these CRDs for fault injection such as :
- PodChaos
- NetworkChaos
- DNSChaos
- HTTPChaos
- StressChaos
- IOChaos
- TimeChaos
- KernelChaos
- AWSChaos
- GCPChaos
- JVMChaos
Chaos Mesh Installation
Chaos Mesh can be installed quickly using installtion script. However, it's recommended to use Helm 3 chart in production environments.
To install Chaos Mesh using Helm :
- Add the Chaos Mesh repository to the Helm repository.
helm repo add chaos-mesh https://charts.chaos-mesh.org
- It's recommended to install ChaosMesh in a separate namespace, so you can either create a namespace
chaos-testing
manually or let Helm create it automatically, if it doesn't exist :
helm upgrade \
--install \
chaos-mesh \
chaos-mesh/chaos-mesh \
-n chaos-testing \
--create-namespace \
--version v2.0.0 \
--wait
Note: If you're using GKE or EKS with containerd
, then use
helm upgrade \
--install \
chaos-mesh \
chaos-mesh/chaos-mesh \
-n chaos-testing \
--create-namespace \
--set chaosDaemon.runtime=containerd \
--set chaosDaemon.socketPath=/run/containerd/containerd.sock \
--version v2.0.0 \
--wait
- Verify if pods are running :
kubectl get pods -n chaos-testing
Run First Chaos Mesh Experiment
Chaos Experiment describes how and what type of fault is injected.
- Setup a Nginx pod and expose it as a service on port 80.
kubectl run nginx --image=nginx --labels="app=nginx" --port=80 --expose
- Open another terminal and setup a test pod to test the connectivity to nginx service :
kubectl run -it test-connection --image=radial/busyboxplus:curl -- sh
curl nginx
this should show the response like this :
- Create your first Chaos Experiment by running :
kubectl apply -f - <<EOF
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
name: nginx-network-delay
spec:
action: delay
mode: one
selector:
namespaces:
- default
labelSelectors:
'app': 'nginx'
delay:
latency: '1s'
duration: '12s'
EOF
this will introduce a delay of 1 seconds in the response of nginx service for 12 seconds.
- Test the response of you nginx service now to see the delay of 1 seconds.
This content originally appeared on DEV Community and was authored by Shardul Srivastava
Shardul Srivastava | Sciencx (2021-08-09T19:51:35+00:00) Cloud Native Chaos Engineering with Chaos Mesh. Retrieved from https://www.scien.cx/2021/08/09/cloud-native-chaos-engineering-with-chaos-mesh/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.