Skip to main content

Provisioning compute

In this lab, we'll use Karpenter to provision AWS Neuron nodes specifically designed for accelerated machine learning inference. Inferentia and Trainium are AWS's purpose-built ML accelerators that provide high performance and cost-effectiveness for running inference workloads like our Mistral-7B model.

tip

To learn more about Karpenter, check out the Karpenter module in this workshop.

Karpenter has already been installed in our EKS cluster and runs as a Deployment:

~$kubectl get deployment karpenter -n kube-system
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
karpenter   2/2     2            2           11m

Let's review the configuration for the Karpenter NodePool that we'll be using to provision Neuron instances:

~/environment/eks-workshop/modules/aiml/chatbot/nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: neuron
spec:
template:
metadata:
labels:
neuron.amazonaws.com/neuron-device: "true"
vpc.amazonaws.com/has-trunk-attached: "true" # Required for Pod ENI
spec:
taints:
- key: aws.amazon.com/neuron
value: "true"
effect: "NoSchedule"
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values: ["trn1.2xlarge", "inf2.xlarge"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["on-demand"]
expireAfter: 720h
terminationGracePeriod: 24h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: neuron
limits:
aws.amazon.com/neuron: 2
cpu: 16
memory: 64Gi
disruption:
consolidateAfter: 300s
consolidationPolicy: WhenEmptyOrUnderutilized

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: neuron
spec:
amiFamily: AL2023
amiSelectorTerms:
- alias: al2023@latest
instanceStorePolicy: RAID0
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
deleteOnTermination: true
encrypted: true
volumeSize: 256Gi
iops: 16000
throughput: 1000
volumeType: gp3
role: ${KARPENTER_NODE_ROLE}
userData: |
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
featureGates:
FastImagePull: true

securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
- tags:
kubernetes.io/cluster/eks-workshop: owned
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
kubernetes.io/role/internal-elb: "1"
tags:
app.kubernetes.io/created-by: eks-workshop
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
aws-neuron: "true"
A

We're configuring the NodePool to use either inf2.xlarge or trn1.2xlarge instance types based on what is available in the region we're running.

B

The NodePool CRD supports defining node properties like instance type and zone. In this example, we're setting the karpenter.sh/capacity-type to initially limit Karpenter to provisioning On-Demand instances, as well as karpenter.k8s.aws/instance-type to limit to a subset of specific instance types. You can learn which other properties are available here.

C

A Taint defines a specific set of properties that allow a node to repel a set of Pods. This property works with its matching label, a Toleration. Both tolerations and taints work together to ensure that Pods are properly scheduled onto the appropriate nodes. You can learn more about the other properties in this resource.

D

A NodePool can define a limit on the amount of CPU and memory managed by it. Once this limit is reached Karpenter will not provision additional capacity associated with that particular NodePool, providing a cap on the total compute.

Let's create the NodePool:

~$cat ~/environment/eks-workshop/modules/aiml/chatbot/nodepool.yaml \
| envsubst | kubectl apply -f-
ec2nodeclass.karpenter.k8s.aws/neuron created
nodepool.karpenter.sh/neuron created

Once properly deployed, check for the NodePools:

~$kubectl get nodepool
NAME         NODECLASS    NODES   READY   AGE
neuron       neuron       0       True    31s

As seen from the above command the NodePool has been properly provisioned, allowing Karpenter to provision new nodes as needed. When we deploy our ML workload in the next step, Karpenter will automatically create the required Neuron instances based on the resource requests and limits we specify.