Scaling Kubernetes Deployments and avoid pitfalls

Most people using Kubernetes because they want to scale applications. With Horizontal Pod Autoscaler cluster can run additional pods to process more traffic.

In theory it is simple but it can be difficult to implement in secure and stable way.
I’d like to describe three common autoscaling pitfalls.

Basic autoscaler example

First, let’s consider simple Deployment with one replica with cpu based HorizontalPodAutoscaler.

Deployment has one replica configured and CPU requests and limits. I configured HorizontalPodAutoscaler to scale replicas from 1 to 10.

I run some workload to test HPA with curl:
curl --data "millicores=300&durationSec=600"

After while I’m seeing that my HPA is working.

My autoscaler is starting additional 3 pods. HPA is watching average CPU consumption across all pods. Autoscaler tries to achieve target average CPU utilisation. Kubernetes calculating pod utilisation as an average of containers is running.
This is a first pitfall: utilisation is measured based on requests not for limits. It is a percentage of requesting resources, in my case 200m means 200% because request is set to 100m.

Deployments with Horizontal Pod Autoscaler

Looks like autoscaler works as expected but how about deployments?
During deployment Kubernetes destroys all replicas created by HorizontalPodAutoscaler and uses replicas from Deployment definition. Of course during heavy load this behaviour is not expected. I defined number of replicas in Deployment and HorizontalPodAutoscaler but replicas can’t be controlled by two objects. To avoid this I need to remove replicas from Deployment definition and apply yaml file again.
It is a second pitfall. You also need to know that in older k8s version (e.g. in 1.15) removing replicas doesn’t take effect without removing deployment.

More containers in pod

Finally we are going to renewal third pitfall.
I add another container to my pod.

Now, I’m going to run workload with the same curl command as before. As result I’m observing load on Pod.

But something weird is with my autoscaler.

My Horizontal Pod Autoscaler doesn’t scale my deployment! Why?
Solution is simple. As I mentioned before, pod CPU is it an average CPU for all containers in pod. Kubernetes check usage against requests, but my nginx container doesn’t have requests. This error is visible when I’m describing my hpa.

Adding requests for all containers fixes this problem. Of course it applies to initContainers also.

Three common pitfalls

Summarizing, there are three common pitfalls when you use HorizontalPodAutoscaler:

  • Utilisation percentage is based on requests.
  • You can’t describe number of replicas both in Deployment and in HorizontalPodAutoscaler. You need to define minReplicas and maxReplicas only.
  • It is important to define requests for all containers in pod. Limits are not necessary.

Code repository with final, working example yaml:

Leave a Reply

Your email address will not be published.