Lab Setup: Chaos Mesh, Scaling, and Pod affinity
This guide outlines steps to enhance the resilience of a UI service by implementing high availability practices. We'll cover installing helm, scaling the UI service, implementing pod anti-affinity, and using a helper script to visualize pod distribution across availability zones.
Installing Chaos Mesh
To enhance our cluster's resilience testing capabilities, we'll install Chaos Mesh. Chaos Mesh is a powerful chaos engineering tool for Kubernetes environments. It allows us to simulate various failure scenarios and test how our applications respond.
Let's install Chaos Mesh in our cluster using Helm:
Release "chaos-mesh" does not exist. Installing it now.
NAME: chaos-mesh
LAST DEPLOYED: Tue Aug 20 04:44:31 2024
NAMESPACE: chaos-mesh
STATUS: deployed
REVISION: 1
TEST SUITE: None
Scaling and Topology Spread Constraints
We use a Kustomize patch to modify the UI deployment, scaling it to 5 replicas and adding topology spread constraints rules. This ensures UI pods are distributed across different nodes, reducing the impact of node failures.
Here's the content of our patch file:
- Kustomize Patch
- Deployment/ui
- Diff
apiVersion: apps/v1
kind: Deployment
metadata:
name: ui
namespace: ui
spec:
replicas: 5
selector:
matchLabels:
app: ui
template:
metadata:
labels:
app: ui
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: ui
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: ui
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/type: app
name: ui
namespace: ui
spec:
replicas: 5
selector:
matchLabels:
app: ui
app.kubernetes.io/component: service
app.kubernetes.io/instance: ui
app.kubernetes.io/name: ui
template:
metadata:
annotations:
prometheus.io/path: /actuator/prometheus
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
labels:
app: ui
app.kubernetes.io/component: service
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/instance: ui
app.kubernetes.io/name: ui
spec:
containers:
- env:
- name: JAVA_OPTS
value: -XX:MaxRAMPercentage=75.0 -Djava.security.egd=file:/dev/urandom
envFrom:
- configMapRef:
name: ui
image: public.ecr.aws/aws-containers/retail-store-sample-ui:0.4.0
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 45
periodSeconds: 20
name: ui
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
memory: 1.5Gi
requests:
cpu: 250m
memory: 1.5Gi
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-volume
securityContext:
fsGroup: 1000
serviceAccountName: ui
topologySpreadConstraints:
- labelSelector:
matchLabels:
app: ui
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- labelSelector:
matchLabels:
app: ui
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
volumes:
- emptyDir:
medium: Memory
name: tmp-volume
app.kubernetes.io/type: app
name: ui
namespace: ui
spec:
- replicas: 1
+ replicas: 5
selector:
matchLabels:
+ app: ui
app.kubernetes.io/component: service
app.kubernetes.io/instance: ui
app.kubernetes.io/name: ui
template:
[...]
prometheus.io/path: /actuator/prometheus
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
labels:
+ app: ui
app.kubernetes.io/component: service
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/instance: ui
app.kubernetes.io/name: ui
[...]
name: tmp-volume
securityContext:
fsGroup: 1000
serviceAccountName: ui
+ topologySpreadConstraints:
+ - labelSelector:
+ matchLabels:
+ app: ui
+ maxSkew: 1
+ topologyKey: topology.kubernetes.io/zone
+ whenUnsatisfiable: ScheduleAnyway
+ - labelSelector:
+ matchLabels:
+ app: ui
+ maxSkew: 1
+ topologyKey: kubernetes.io/hostname
+ whenUnsatisfiable: ScheduleAnyway
volumes:
- emptyDir:
medium: Memory
name: tmp-volume
Apply the changes using Kustomize patch and Kustomization file:
Verify Retail Store Accessibility
After applying these changes, it's important to verify that your retail store is accessible:
Waiting for k8s-ui-ui-5ddc3ba496-721427594.us-west-2.elb.amazonaws.com...
You can now access http://k8s-ui-ui-5ddc3ba496-721427594.us-west-2.elb.amazonaws.com
Once this command completes, it will output a URL. Open this URL in a new browser tab to verify that your retail store is accessible and functioning correctly.
The retail url may take 5-10 minutes to become operational.
Helper Script: Get Pods by AZ
The get-pods-by-az.sh
script helps visualize the distribution of Kubernetes pods across different availability zones in the terminal. You can view the script file on github here.