Kubernetes Cluster Autoscaling: Autoscaler vs. Karpenter

Autoscaling in a nutshell

When we are working with workloads that dynamically demand more or fewer resources in terms of CPU or memory we need to think of solutions that allow us to deploy and fit these workloads in production. In this post we will talk about a few concepts like autoscaling:

"Autoscaling is a method used in cloud computing that dynamically adjusts the number of computational resources in a server farm - typically measured by the number of active servers - automatically based on the load on the farm"

Good, now we know what auto-scaling is and in which case it is used. If you have an e-commerce app is probably that you need auto scalers many times a year, an example is Amazon Prime Day where the traffic going into servers may grow for a few hours.

K8s Autoscaling

Kubernetes is one of the container orchestration platforms with major automation capabilities. Kubernetes autoscaling helps optimize resource usage and costs by automatically scaling a cluster up and down in line with demand. Kubernetes enables autoscaling at the cluster/node level as well as at the pod level, two different but fundamentally connected layers of Kubernetes architecture.

K8s Autoscaler (Native)

Autoscaler is a Kubernetes native tool that increases or decreases the size of a Kubernetes cluster (by adding or removing nodes), based on the presence of pending pods and node utilization metrics. Its functions are:

Adds nodes to a cluster whenever it detects pending pods that could not be scheduled due to resource shortages.
Removes nodes from a cluster, whenever the utilization of a node falls below a certain threshold defined by the cluster administrator.

K8s Cluster Autoscaler Issues

The cluster Autoscaler only functions correctly with Kubernetes node groups/instance groups that have nodes with the same capacity. For public cloud providers like AWS, this might not be optimal, since diversification and availability considerations dictate the use of multiple instance types.
When a new pod with different needs that node groups already configured is scheduled, it’s necessary to configure a new node group, tell Autoscaler about it, set how to scale it, set some weights on it.
We have no control of the zones where a node will be created.

Karpenter

Karpenter is an open-source node provisioning project built for Kubernetes. The project was started by AWS and currently is supported only and exclusively in it, although Karpenter is designed to work with other cloud providers.

Unlike Autoscaler, Karpenter doesn’t have node groups, it will talk directly to EC2 and “put things” directly in a zone that we want. We can just say “hey EC2, give me that instance in that zone" and that's all.

Advantages

VMs based on workloads: Karpenter can pick the right instance type automatically.
Flexibility: It can work with different instance types, with different zones, and with quite a few other different parameters.
Group-less node provisioning: Karpenter working directly with VMs and that's speeds things up drastically.
Pods bound to nodes before they are even created so everything happens faster.
Multiple provisioners: We can have any number of provisioners (with Autoscaler there is only one config), Karpenter will pick the provisioner that matches the requirements.

Step by step

The first thing we need to do is to create the cluster itself.
Next, we have to create a VPC or use one already created, then we need to add additional tags to your subnets so that Karpenter knows what is my cluster (if you use terraform you can add these tags in the VPC resource).
Next, we have to create IAM Roles for Karpenter Controllers.
Once IAM is created, we can deploy Karpenter through Helm.
Finally, we can deploy your provisioners and feel the Karpenter power.

You can choose the preferred way to deploy, in this case, we will use Terraform but you also can use CloudFormation

PoC

Preparing environment

Karpenter is easy to deploy but is necessary to prepare all the environments beforehand (iam node role, iam controller role, iam spot role, etc). This process can be a bit tedious due to AWS security needs. We go to deploy Karpenter in a test environment; first, we need to set up some environment variables and deploy eks.tf and vpc.tf (we select us-east-1 as region):

export CLUSTER_NAME="${USER}-karpenter-demo"

export AWS_DEFAULT_REGION="us-east-1"

terraform init

terraform apply -var "cluster_name=${CLUSTER_NAME}" The deployment will fail because we don’t set our kube-config file

We can run the below command to redeploy config maps correctly:

export KUBECONFIG="${PWD}/kubeconfig_${CLUSTER_NAME}"

export KUBE_CONFIG_PATH="${KUBECONFIG}"

terraform apply -var "cluster_name=${CLUSTER_NAME}"

Because we going to use Spot services is necessary to add an extra IAM Role:

aws iam create-service-linked-role --aws-service-name spot.amazonaws.com

Next, we have to deploy Karpenter IAM Roles (kcontroller.tf and knode.tf) and deploy Karpenter as a Helm package through Terraform (karpenter.tf). It is necessary to use “terraform init” because we going to use a new module:

terraform init

terraform apply -var "cluster_name=${CLUSTER_NAME}"

Feel the Karpenter power

In this case, we have an EKS cluster with one t3a.medium (2vCPU and 4Gbi) as node default and one application deployment with 5 replicas of 1vCPU and 1Gbi so our node will not be able to deploy them, then Karpenter will notice this failed implementation and it takes care of fixing it.

First, we go to analyze the Karpenter namespace and resources (one service and one deployment of one replica set that deploy one pod).

Then, we will deploy the application YAML file and will be able to see that our node can’t run it.

Next, we will deploy the provisioner YAML file and will be able to see that Karpenter will scale up automatically to a new node (c6i.2xlarge) that will be able to run the above deployment.

Finally, we will to destroy the application deployment and will be able to see how Karpenter will take care of destroying automatically the nodes after 30 empty seconds.

Final thoughts

Karpenter is an excellent replacement for the native Autoscaler. Autoscaler isn't a final solution but as the blueprint and now we have a really good solution to be more precise some of us have a really good solution but others don't because is only supported on AWS, the closest thing we have to Karpenter would be GKE Autopilot for GCP, but with some differences. Back to Karpenter, it's more intelligent and can give us more flexibility, it must be the next tool that you use if you currently work with autoscaling in EKS.

References

Karpenter Official Documentation: https://karpenter.sh/v0.6.4/
Terraform resources: https://registry.terraform.io/providers/hashicorp/aws/latest/docs
DevOps Toolkit channel: https://www.youtube.com/c/DevOpsToolkit
Terraform PoC repository: https://github.com/teracloud-io/karpenter-blog

Nicolas Balmaceda

DevOps Engineer

teracloud.io

If you are interested in learning more about our #TeraTips or our blog's content, we invite you to see all the content entries that we have created for you and your needs.

K8s Cluster Auto-scalers: Autoscaler vs Karpenter