Deploy the AWS Node Termination Handler

We need Helm to deploy the AWS Node Termination Handler, see installing Helm for instructions.

In this section, we will prepare our cluster to handle Spot interruptions.

If the available On-Demand capacity of a particular instance type is deleted, the Spot Instance is sent an interruption notice two minutes ahead to gracefully wrap up things. We will deploy a pod on each spot instance to detect and redeploy applications elsewhere in the cluster.

The first thing that we need to do is deploy the AWS Node Termination Handler on each Spot Instance. This will monitor the EC2 metadata service on the instance for an interruption notice. The termination handler consists of a ServiceAccount, ClusterRole, ClusterRoleBinding, and a DaemonSet.

The workflow can be summarized as:

  • Identify that a Spot Instance is being reclaimed.
  • Use the 2-minute notification window to gracefully prepare the node for termination.
  • Taint the node and cordon it off to prevent new pods from being placed.
  • Drain connections on the running pods.
  • Replace the pods on remaining nodes to maintain the desired capacity.

By default, the aws-node-termination-handler will run on all of your nodes (on-demand and spot). If your spot instances are labeled, you can configure aws-node-termination-handler to only run on your labeled spot nodes. If you’re using the tag lifecycle=Ec2Spot, you can run the following to apply our spot-node-selector overlay.

helm repo add eks https://aws.github.io/eks-charts

helm install --name aws-node-termination-handler \
             --namespace kube-system \
             --set nodeSelector.lifecycle=Ec2Spot \
             eks/aws-node-termination-handler

Verify that the pods are only running on node with label lifecycle=Ec2Spot

kubectl --namespace=kube-system get daemonsets 

Spot Daemonset