Provisioning compute
In this lab, we'll use Karpenter to provision AWS Neuron nodes specifically designed for accelerated machine learning inference. Inferentia and Trainium are AWS's purpose-built ML accelerators that provide high performance and cost-effectiveness for running inference workloads like our Mistral-7B model.
To learn more about Karpenter, check out the Karpenter module in this workshop.
Karpenter has already been installed in our EKS cluster and runs as a Deployment:
NAME READY UP-TO-DATE AVAILABLE AGE
karpenter 2/2 2 2 11m
Let's review the configuration for the Karpenter NodePool that we'll be using to provision Neuron instances:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: neuron
spec:
template:
metadata:
labels:
neuron.amazonaws.com/neuron-device: "true"
vpc.amazonaws.com/has-trunk-attached: "true" # Required for Pod ENI
spec:
taints:
- key: aws.amazon.com/neuron
value: "true"
effect: "NoSchedule"
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values: ["trn1.2xlarge", "inf2.xlarge"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["on-demand"]
expireAfter: 720h
terminationGracePeriod: 24h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: neuron
limits:
aws.amazon.com/neuron: 2
cpu: 16
memory: 64Gi
disruption:
consolidateAfter: 300s
consolidationPolicy: WhenEmptyOrUnderutilized
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: neuron
spec:
amiFamily: AL2023
amiSelectorTerms:
- alias: al2023@latest
instanceStorePolicy: RAID0
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
deleteOnTermination: true
encrypted: true
volumeSize: 256Gi
iops: 16000
throughput: 1000
volumeType: gp3
role: ${KARPENTER_NODE_ROLE}
userData: |
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
featureGates:
FastImagePull: true
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
- tags:
kubernetes.io/cluster/eks-workshop: owned
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
kubernetes.io/role/internal-elb: "1"
tags:
app.kubernetes.io/created-by: eks-workshop
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
aws-neuron: "true"
We're configuring the NodePool to use either inf2.xlarge or trn1.2xlarge instance types based on what is available in the region we're running.
The NodePool CRD supports defining node properties like instance type and zone. In this example, we're setting the karpenter.sh/capacity-type to initially limit Karpenter to provisioning On-Demand instances, as well as karpenter.k8s.aws/instance-type to limit to a subset of specific instance types. You can learn which other properties are available here.
A Taint defines a specific set of properties that allow a node to repel a set of Pods. This property works with its matching label, a Toleration. Both tolerations and taints work together to ensure that Pods are properly scheduled onto the appropriate nodes. You can learn more about the other properties in this resource.
A NodePool can define a limit on the amount of CPU and memory managed by it. Once this limit is reached Karpenter will not provision additional capacity associated with that particular NodePool, providing a cap on the total compute.
Let's create the NodePool:
ec2nodeclass.karpenter.k8s.aws/neuron created
nodepool.karpenter.sh/neuron created
Once properly deployed, check for the NodePools:
NAME NODECLASS NODES READY AGE
neuron neuron 0 True 31s
As seen from the above command the NodePool has been properly provisioned, allowing Karpenter to provision new nodes as needed. When we deploy our ML workload in the next step, Karpenter will automatically create the required Neuron instances based on the resource requests and limits we specify.