Most people using Kubernetes because they want to scale applications. With Horizontal Pod Autoscaler cluster can run additional pods to process more traffic.
In theory it is simple but it can be difficult to implement in secure and stable way.
I’d like to describe three common autoscaling pitfalls.
Basic autoscaler example
First, let’s consider simple Deployment with one replica with cpu based HorizontalPodAutoscaler.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
--- apiVersion: apps/v1 kind: Deployment metadata: name: hpa-demo labels: app: hpa spec: replicas: 1 selector: matchLabels: app: hpa template: metadata: labels: app: hpa spec: containers: - name: resource-consumer image: gcr.io/kubernetes-e2e-test-images/resource-consumer:1.5 resources: requests: cpu: "100m" limits: cpu: "200m" ports: - containerPort: 8080 --- apiVersion: v1 kind: Service metadata: name: hpa-demo labels: run: hpa spec: type: LoadBalancer selector: app: hpa ports: - port: 8080 targetPort: 8080 --- apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: hpa-demo spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: hpa-demo minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 |
Deployment has one replica configured and CPU requests and limits. I configured HorizontalPodAutoscaler to scale replicas from 1 to 10.
I run some workload to test HPA with curl:curl --data "millicores=300&durationSec=600" 10.108.35.34:8080/ConsumeCPU
.
After while I’m seeing that my HPA is working.
1 2 3 |
> kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE hpa-demo Deployment/hpa-demo 49%/50% 1 10 4 14m |
My autoscaler is starting additional 3 pods. HPA is watching average CPU consumption across all pods. Autoscaler tries to achieve target average CPU utilisation. Kubernetes calculating pod utilisation as an average of containers is running.
This is a first pitfall: utilisation is measured based on requests not for limits. It is a percentage of requesting resources, in my case 200m means 200% because request is set to 100m.
Deployments with Horizontal Pod Autoscaler
Looks like autoscaler works as expected but how about deployments?
During deployment Kubernetes destroys all replicas created by HorizontalPodAutoscaler and uses replicas from Deployment definition. Of course during heavy load this behaviour is not expected. I defined number of replicas in Deployment and HorizontalPodAutoscaler but replicas can’t be controlled by two objects. To avoid this I need to remove replicas
from Deployment definition and apply yaml file again.
It is a second pitfall. You also need to know that in older k8s version (e.g. in 1.15) removing replicas
doesn’t take effect without removing deployment.
More containers in pod
Finally we are going to renewal third pitfall.
I add another container to my pod.
1 2 |
- name: nginx image: nginx |
Now, I’m going to run workload with the same curl command as before. As result I’m observing load on Pod.
1 2 3 |
> kubectl top pods NAME CPU(cores) MEMORY(bytes) hpa-demo-57c58d5fb7-9kssn 198m 9Mi |
But something weird is with my autoscaler.
1 2 3 |
> kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE hpa-demo Deployment/hpa-demo 0%/50% 1 10 1 3h54m |
My Horizontal Pod Autoscaler doesn’t scale my deployment! Why?
Solution is simple. As I mentioned before, pod CPU is it an average CPU for all containers in pod. Kubernetes check usage against requests, but my nginx container doesn’t have requests. This error is visible when I’m describing my hpa.
1 2 3 4 |
> kubectl describe hpa hpa-demo ... ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: missing request for cpu ... |
Adding requests for all containers fixes this problem. Of course it applies to initContainers also.
Three common pitfalls
Summarizing, there are three common pitfalls when you use HorizontalPodAutoscaler:
- Utilisation percentage is based on requests.
- You can’t describe number of replicas both in Deployment and in HorizontalPodAutoscaler. You need to define
minReplicas
andmaxReplicas
only. - It is important to define requests for all containers in pod. Limits are not necessary.
Code repository with final, working example yaml: https://gitlab.com/es1o/blog.eliszewski.pl-code/-/tree/master/scaling-demo