Most people using Kubernetes because they want to scale applications. With Horizontal Pod Autoscaler cluster can run additional pods to process more traffic.
In theory it is simple but it can be difficult to implement in secure and stable way.
I’d like to describe three common autoscaling pitfalls.
Basic autoscaler example
First, let’s consider simple Deployment with one replica with cpu based HorizontalPodAutoscaler.
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-demo
labels:
app: hpa
spec:
replicas: 1
selector:
matchLabels:
app: hpa
template:
metadata:
labels:
app: hpa
spec:
containers:
- name: resource-consumer
image: gcr.io/kubernetes-e2e-test-images/resource-consumer:1.5
resources:
requests:
cpu: "100m"
limits:
cpu: "200m"
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: hpa-demo
labels:
run: hpa
spec:
type: LoadBalancer
selector:
app: hpa
ports:
- port: 8080
targetPort: 8080
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Deployment has one replica configured and CPU requests and limits. I configured HorizontalPodAutoscaler to scale replicas from 1 to 10.
I run some workload to test HPA with curl:curl --data "millicores=300&durationSec=600" 10.108.35.34:8080/ConsumeCPU
.
After while I’m seeing that my HPA is working.
> kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa-demo Deployment/hpa-demo 49%/50% 1 10 4 14m
My autoscaler is starting additional 3 pods. HPA is watching average CPU consumption across all pods. Autoscaler tries to achieve target average CPU utilisation. Kubernetes calculating pod utilisation as an average of containers is running.
This is a first pitfall: utilisation is measured based on requests not for limits. It is a percentage of requesting resources, in my case 200m means 200% because request is set to 100m.
Deployments with Horizontal Pod Autoscaler
Looks like autoscaler works as expected but how about deployments?
During deployment Kubernetes destroys all replicas created by HorizontalPodAutoscaler and uses replicas from Deployment definition. Of course during heavy load this behaviour is not expected. I defined number of replicas in Deployment and HorizontalPodAutoscaler but replicas can’t be controlled by two objects. To avoid this I need to remove replicas
from Deployment definition and apply yaml file again.
It is a second pitfall. You also need to know that in older k8s version (e.g. in 1.15) removing replicas
doesn’t take effect without removing deployment.
More containers in pod
Finally we are going to renewal third pitfall.
I add another container to my pod.
- name: nginx
image: nginx
Now, I’m going to run workload with the same curl command as before. As result I’m observing load on Pod.
> kubectl top pods
NAME CPU(cores) MEMORY(bytes)
hpa-demo-57c58d5fb7-9kssn 198m 9Mi
But something weird is with my autoscaler.
> kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa-demo Deployment/hpa-demo 0%/50% 1 10 1 3h54m
My Horizontal Pod Autoscaler doesn’t scale my deployment! Why?
Solution is simple. As I mentioned before, pod CPU is it an average CPU for all containers in pod. Kubernetes check usage against requests, but my nginx container doesn’t have requests. This error is visible when I’m describing my hpa.
> kubectl describe hpa hpa-demo
...
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: missing request for cpu
...
Adding requests for all containers fixes this problem. Of course it applies to initContainers also.
Three common pitfalls
Summarizing, there are three common pitfalls when you use HorizontalPodAutoscaler:
- Utilisation percentage is based on requests.
- You can’t describe number of replicas both in Deployment and in HorizontalPodAutoscaler. You need to define
minReplicas
andmaxReplicas
only. - It is important to define requests for all containers in pod. Limits are not necessary.
Code repository with final, working example yaml: https://gitlab.com/es1o/blog.eliszewski.pl-code/-/tree/master/scaling-demo