Provisioning compute

In this lab, we'll use Karpenter to provision AWS Neuron nodes specifically designed for accelerated machine learning inference. Inferentia and Trainium are AWS's purpose-built ML accelerators that provide high performance and cost-effectiveness for running inference workloads like our Mistral-7B model.

tip

To learn more about Karpenter, check out the Karpenter module in this workshop.

Karpenter has already been installed in our EKS cluster and runs as a Deployment:

~$kubectl get deployment karpenter -n kube-system

NAME        READY   UP-TO-DATE   AVAILABLE   AGE

karpenter   2/2     2            2           11m

Let's review the configuration for the Karpenter NodePool that we'll be using to provision Neuron instances:

~/environment/eks-workshop/modules/aiml/chatbot/nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: neuron
spec:
  template:
    metadata:
      labels:
        neuron.amazonaws.com/neuron-device: "true"
        vpc.amazonaws.com/has-trunk-attached: "true" # Required for Pod ENI
    spec:
      taints:
        - key: aws.amazon.com/neuron
          value: "true"
          effect: "NoSchedule"
      requirements:
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["trn1.2xlarge", "inf2.xlarge"]
        - key: "kubernetes.io/arch"
          operator: In
          values: ["amd64"]
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["on-demand"]
      expireAfter: 720h
      terminationGracePeriod: 24h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: neuron
  limits:
    aws.amazon.com/neuron: 2
    cpu: 16
    memory: 64Gi
  disruption:
    consolidateAfter: 300s
    consolidationPolicy: WhenEmptyOrUnderutilized

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: neuron
spec:
  amiFamily: AL2023
  amiSelectorTerms:
    - alias: al2023@latest
  instanceStorePolicy: RAID0
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        deleteOnTermination: true
        encrypted: true
        volumeSize: 256Gi
        iops: 16000
        throughput: 1000
        volumeType: gp3
  role: ${KARPENTER_NODE_ROLE}
  userData: |
    apiVersion: node.eks.aws/v1alpha1
    kind: NodeConfig
    spec:
      featureGates:
        FastImagePull: true

  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
    - tags:
        kubernetes.io/cluster/eks-workshop: owned
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
        kubernetes.io/role/internal-elb: "1"
  tags:
    app.kubernetes.io/created-by: eks-workshop
    karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
    aws-neuron: "true"

We're configuring the NodePool to use either inf2.xlarge or trn1.2xlarge instance types based on what is available in the region we're running.

The NodePool CRD supports defining node properties like instance type and zone. In this example, we're setting the karpenter.sh/capacity-type to initially limit Karpenter to provisioning On-Demand instances, as well as karpenter.k8s.aws/instance-type to limit to a subset of specific instance types. You can learn which other properties are available here.

A Taint defines a specific set of properties that allow a node to repel a set of Pods. This property works with its matching label, a Toleration. Both tolerations and taints work together to ensure that Pods are properly scheduled onto the appropriate nodes. You can learn more about the other properties in this resource.

A NodePool can define a limit on the amount of CPU and memory managed by it. Once this limit is reached Karpenter will not provision additional capacity associated with that particular NodePool, providing a cap on the total compute.

Let's create the NodePool:

~$cat ~/environment/eks-workshop/modules/aiml/chatbot/nodepool.yaml \

| envsubst | kubectl apply -f-

ec2nodeclass.karpenter.k8s.aws/neuron created

nodepool.karpenter.sh/neuron created

Once properly deployed, check for the NodePools:

~$kubectl get nodepool

NAME         NODECLASS    NODES   READY   AGE

neuron       neuron       0       True    31s

As seen from the above command the NodePool has been properly provisioned, allowing Karpenter to provision new nodes as needed. When we deploy our ML workload in the next step, Karpenter will automatically create the required Neuron instances based on the resource requests and limits we specify.