Cerebral vs Kubernetes Cluster Autoscaler: A Demo of Preemptive Autoscaling

Cerebral is an open source, provider agnostic tool for increasing or decreasing the size of pools of nodes in your Kubernetes cluster in response to alerts generated by user-defined policies. These policies reference pluggable and configurable metrics backends for gathering metrics to make autoscaling decisions with.

This article compares and contrasts Cerebral with the Kubernetes Cluster Autoscaler in a fair, easily reproducible kops environment.

Kubernetes Autoscaling Basics

To understand autoscaling in production, we first have to take a step back and understand Kubernetes autoscaling in a general sense. There are several classes of autoscalers relevant to Kubernetes:

In this article, we'll focus on horizontal pod autoscaling and cluster autoscaling. They are commonly used in tandem with each other, so we'll take a look at the interaction between an HPA and cluster autoscaler and the trade-offs between different cluster autoscaler implementations. In particular, we'll walk through the differences between Cerebral and the Kubernetes Cluster Autoscaler (CA) project.

Kubernetes Cluster Autoscaler Overview

The Kubernetes Cluster Autoscaler takes a simple approach, at least at a high level, to deciding when to add nodes to a cluster: it scales up when it sees that a pod can't be scheduled and determines that adding a new node would allow the pod to be scheduled. This approach has many benefits, such as the fact that it relies only on core Kubernetes concepts. It also guarantees that a node will only be added if it is absolutely required, since it knows that a pod is stuck in the Pending state due to insufficient resources.

The primary downside of this approach is that it's often too slow to meet real business requirements. Waiting until a pod can't be scheduled is too late in many cases†.

Scaling down using the CA is also straightforward at a high level: it removes a node if it has been underutilized for a set amount of time and if the pods currently running on that node can be rescheduled onto a different node.

For the nitty-gritty details of how the CA works, the FAQ is an excellent resource.

Cerebral Overview

Cerebral was built with flexibility in mind. It gives operators the power to easily configure autoscaling on metrics collected from a variety of sources, and to do so in a way that's easily portable between cloud environments through the use of Custom Resource Definitions (CRDs).

The preemptive autoscaling approach taken by Cerebral is fundamentally different from that of the standard Kubernetes Cluster Autoscaler. Rather than waiting until a pod can't be scheduled, Cerebral decides to scale based on user-specified metrics from a configurable metrics backend. At the time of writing, the following backends are supported in the main Cerebral repository: Prometheus, InfluxDB, and Kubernetes itself.

It's also easy to build a custom backend by implementing a simple interface and defining a configuration for the backend in a new MetricsBackend CRD. The possibilities for custom backends are limitless. For example, Cerebral could be used to autoscale a cluster based on the depth of an application-specific queue.

In this article, we'll stick to using the Kubernetes backend for a fair comparison with the Kubernetes CA.

For more details on how Cerebral works, as well as information for getting started with other metrics backends and cloud providers, please refer to the project on GitHub.

Autoscaling Experiments

Let's work through a simple example of using an HPA in conjunction with both the Kubernetes Cluster Autoscaler and Cerebral. This will allow us to contrast the Kubernetes CA and Cerebral in a simulated real-world scenario. (If you have trouble following along for any reason, please feel free to reach out).

If you're not interested in the experiments themselves, please feel free to skip to the Key Takeaways section.

Provision a Test Cluster on AWS

While Cerebral is fully integrated into Containership Kubernetes Engine (CKE) which supports AWS, let's use kops instead in order to build from the basics.

We've created a public gist with a very basic kops cluster config that anyone can copy in order to follow along. The cluster has a single master and a node pool (equating to an AWS Auto Scaling Group) with a min size of 1 node and a max size of 5 nodes. Simply create a new bucket in S3 to use as the cluster state store and create the cluster as follows. First create the cluster configuration in kops:

kops --state=s3://bucket-name create -f https://gist.githubusercontent.com/mattkelly/692af294868ff50a6c5664ea63b7e9c4/raw/48b00e3dc3d7de18c040522a6392a92beb9dc7e1/kops-autoscaling-demo-cluster-config.yaml

You should see some output like this if things went well:

Now apply the changes to actually create the cluster:

kops --state=s3://bucket-name update cluster autoscaling-experiment.k8s.local --yes

It may prompt you to specify a public SSH key first. If so, simply follow the instructions to create the key and then re-run the update command.

You should see a bunch of output, and then eventually something like:

After a few minutes, you should be able to interact with the cluster. Try kubectl get nodes until it works (note that if all went well, kops has set your kubectl context appropriately).

For more information on kops, please refer to the repository.

Deploy the Kubernetes Metrics Server

Horizontal Pod Autoscalers need to know how many resources pods are consuming.
For that, we need to deploy the metrics-server††. Note that this is only required for the HPA and not for Cerebral nor the Kubernetes CA.

The easiest way to get the metrics-server up and running is to simply clone the repository and deploy all of the Kubernetes v1.8+ manifests. Unfortunately, we found that some workarounds were needed to make the manifests work properly on this kops cluster. We will update this article once this workaround is no longer required. (Note that the kops config we're using also includes part of this workaround - see this comment.)

In the meantime, you can use a branch on my fork instead:

git clone git@github.com:mattkelly/metrics-server.git
cd metrics-server
git checkout bugfix/deploy-1.8+

Apply the relevant manifests as follows:

kubectl apply -f deploy/1.8+/

Check the metrics-server pod logs to make sure things are healthy:

kubectl -n kube-system logs -lk8s-app=metrics-server

Create a Resource Consumer Deployment for Autoscaling

Now that we have a cluster up and running with a healthy metrics server, let's go ahead and create a deployment that we can wrap in an HPA for running experiments. It would be great if we could get it to scale on-demand by forcing it to consume a specified amount of CPU or memory as the result of some request. Luckily, the resource-consumer living in the depths of the Kubernetes repository does exactly that.

Let's autoscale a resource-consumer deployment based on CPU utilization. Our worker node pool is utilizing t2.small instances, which have 1 CPU (1000 millicores) and 2 GB of memory. Since the minimum node count of the worker pool is 1 and it has not scaled yet, there is only a single node in the pool. Using kubectl describe node, we see that the node is already using 640 millicores of CPU:

Let's run the resource-consumer in the default namespace with a request of 100m (100 millicores) and expose it using a load balancer so we can send requests to it:

kubectl run resource-consumer --image=gcr.io/kubernetes-e2e-test-images/resource-consumer:1.4 --expose --service-overrides='{ "spec": { "type": "LoadBalancer" } }' --port 8080 --requests='cpu=100m'

Immediately, we see that the initial single replica is requesting another 100m from the node, as expected:

Since the resource-consumer deployment is exposed using a load balancer, we should also be able to use kubectl get service to see that it has an external address properly assigned to it:

Let's export the full address for making requests to later:

export RESOURCE_CONSUMER_ADDRESS="http://$(kubectl get service resource-consumer -ojsonpath='{.status.loadBalancer.ingress[0].hostname}'):8080"

Create a Horizontal Pod Autoscaler (HPA)

Now let's add an HPA to the resource-consumer deployment. This HPA will add a new pod (maxing out at 10 pods) if the observed pod CPU utilization goes above 50 percent of the pod CPU request. It will also scale back down to a minimum of 1 pod if the pods in the deployment do not exhibit high CPU utilization.

kubectl autoscale deployment resource-consumer --cpu-percent=50 --min=1 --max=10

You should be able to see the newly created HPA using kubectl get hpa, since it's just another Kubernetes resource:

Note: HPAs are just reconciliation loops like any other Kubernetes controller. By default, they gather metrics every 15 seconds. As such, the percent utilization may briefly show as <unknown> until the metrics are gathered.

As expected, the resource-consumer isn't really consuming any CPU because we haven't requested it to do so yet.

Now that we have a HPA set up, let's force it to scale up by consuming CPU and seeing how both the Kubernetes Cluster Autoscaler and Cerebral react!

Set Up the Kubernetes Cluster Autoscaler

We've created a public gist that makes it easy to get the Kubernetes Cluster Autoscaler (CA) up and running on this kops cluster. It tweaks the CA example AWS manifests as follows:

  • Add a secret which the main deployment has been updated to reference
  • Set the deployment command to properly reference our kops worker node pool with the expected min/max bounds.
  • Add toleration and node selector to make it run on masters only

Simply download the file, fill in valid values for AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, and kubectl apply it.

You can check that the cluster-autoscaler deployment is healthy by checking the logs:

kubectl -n kube-system logs -lapp=cluster-autoscaler

(The cerebral-aws-engine secret applied here could be used with the Cerebral autoscaler as well.)

Note: This is just one simple way to run the CA for demonstration purposes. If you're interested in running the CA in production, please read the documentation to determine what is best for your environment (including authentication method, etc).

Cluster Autoscaling Demo Using the Kubernetes Cluster Autoscaler

The Kubernetes Cluster Autoscaler scales only when a pod cannot be scheduled. This scaling methodology is one of the primary differences between the CA and Cerebral. Let's see exactly what this means in practice.

To simulate the beginning of a spike in demand, let's tell the resource-consumer to consume CPU by POSTing a request to its ConsumeCPU endpoint:

curl --data "millicores=60&durationSec=600" $RESOURCE_CONSUMER_ADDRESS/ConsumeCPU

If you'd like, you can watch the action in real time by running a watch in a second terminal:

watch --differences kubectl get deployments,replicasets,pods,hpa

After a few moments, you should see that this spike caused the HPA to add another pod to the deployment in order to meet the demand and keep the CPU utilization of pods in the deployment below the target of 50%:

After a few more moments, you should see that things settle down and the HPA reports the new CPU utilization:

And kubectl describe node shows that the new replica added another 100m of CPU request to the node:

Great! We have a simple, repeatable environment and the Horizontal Pod Autoscaler works as expected.

Now let's suppose that over a brief period of time, the demand increases even more:

curl --data "millicores=120&durationSec=600" $RESOURCE_CONSUMER_ADDRESS/ConsumeCPU

After a few moments, we see that the HPA scales up to attempt to meet the new demand:

It looks like we have a pod in the Pending state, however, so we have not successfully scaled all the way to meet the new demand. What's going on here? Using kubectl describe on that pod, we can see that the pod was unable to be scheduled because the worker node doesn't have enough CPU for all of the CPU requests. Since we're now at 4 replicas which is an additional 400m of CPU request over the base state of 640m/1000m, only 3/4 replicas can be scheduled on the node.

kubectl describe pod/resource-consumer-55c57bc84c-986lk

We can also see that a scale up was triggered due to the pending pod. After waiting for the new node to come up, another kubectl describe shows that the pod was eventually scheduled on the new node.

A better way to measure the timing of these events is to use the custom columns feature of kubectl look at the first occurrence of each event:

kubectl get events --field-selector=involvedObject.name=resource-consumer-55c57bc84c-986lk -o=custom-columns=FIRST_TIMESTAMP:.firstTimestamp,TYPE:.type,REASON:.reason,MESSAGE:.message

From the above, we can see that the CA was quick to react and ask AWS to scale up the instance group - but only after the pod failed to be scheduled in the first place.

Cooling Down

If we wait for the duration specified by durationSec in the request to the resource-consumer, we'll see the resource-consumer fall back down to its original resource usage. The CA will take note of this and request a scale down after the default cool down period of 10 minutes. This is critical for cost savings. The cool down period can be tweaked via the CA configuration.

Preemptive Autoscaling Using Cerebral

Clean Up and Start Fresh

To easiest way to ensure we're starting with a clean slate is to simply delete the kops cluster and recreate it. To delete it, run the following command:

kops --state=s3://bucket-name delete cluster autoscaling-experiment.k8s.local --yes

Now just recreate the test setup, starting from the kops cluster creation and ending after the HPA is configured.

Deploy Cerebral

There are numerous example manifests for getting started in the Cerebral repository. First, we need to apply some prerequisite manifests that define some Custom Resource Definitions, a Service Account for Cerebral, and some RBAC rules:

kubectl apply -f https://raw.githubusercontent.com/containership/cerebral/v0.4.0-alpha/examples/common/00-prereqs.yaml

We also have to apply the same secret with AWS credentials as before. We can download the example secret manifest and edit it appropriately:

curl -O https://raw.githubusercontent.com/containership/cerebral/master/examples/engines/aws/00-secret-cerebral-aws.yaml

# Edit OO-secret-cerebral-aws.yaml to fill in valid values

kubectl apply -f OO-secret-cerebral-aws.yaml 

Next, let's deploy the Cerebral Operator itself, which runs as a Kubernetes Deployment:

kubectl apply -f https://raw.githubusercontent.com/containership/cerebral/v0.4.0-alpha/examples/engines/aws/10-deployment-cerebral-aws.yaml

An AutoscalingEngine resource is required to register the AWS autoscaling engine with Cerebral:

kubectl apply -f https://raw.githubusercontent.com/containership/cerebral/master/examples/engines/aws/20-autoscaling-engine-aws.yaml

Finally, let's register the Kubernetes MetricsBackend with Cerebral:

kubectl apply -f https://raw.githubusercontent.com/containership/cerebral/v0.4.0-alpha/examples/metrics_backends/kubernetes/00-metrics-backend-kubernetes.yaml

If all goes well, the logs should look healthy and we should see messages about the engine and backend being registered successfully:

kubectl -n kube-system logs -lapp.kubernetes.io/name=cerebral

Configure Autoscaling Groups and Policies

Now that Cerebral is up and running with the Kubernetes metrics backend and AWS engine registered, we can shift our focus to defining an autoscaling group and policies by which to scale that group.

There are example AutoscalingGroups as well as example AutoscalingPolicies available in the Cerebral repository. However, let's create a group and policy more specific to our test setup here.

First, let's define an AutoscalingPolicy that uses the kubernetes metrics backend that we defined in a previous step to gather CPU allocation metrics to make autoscaling decisions with. We'll poll the metrics backend every 15 seconds and have a sample period of 30 seconds. If the CPU allocation is greater than or equal to 80% for more than that sample period, Cerebral will scale up by one node:

cat > cpu-example-policy.yaml <<EOF
apiVersion: cerebral.containership.io/v1alpha1
kind: AutoscalingPolicy
metadata:
  name: cpu-example-policy
spec:
  metric: cpu_percent_allocation
  metricConfiguration: {}
  metricsBackend: kubernetes
  pollInterval: 15
  samplePeriod: 30
  scalingPolicy:
    scaleUp:
      adjustmentType: absolute
      adjustmentValue: 1
      comparisonOperator: '>='
      threshold: 80
EOF

Now we apply it:

kubectl apply -f cpu-example-policy.yaml

Next, let's define our AutoscalingGroup such that minNodes and maxNodes match the AWS AutoscalingGroup as defined by kops. This is important in order to avoid unexpected behavior. The label selector used to select the nodes that belong to the group is chosen to match the label we assigned to our worker group in the kops config, and the policy that we just defined is attached to this group:

cat > kops-group.yaml <<EOF
apiVersion: cerebral.containership.io/v1alpha1
kind: AutoscalingGroup
metadata:
  name: kops-workers-asg
spec:
  nodeSelector:
    "node-pool": "workers"
  policies:
  - cpu-example-policy
  engine: aws
  cooldownPeriod: 60
  minNodes: 1
  maxNodes: 5
  scalingStrategy:
    scaleUp: random
    scaleDown: random
EOF

Now apply it:

kubectl apply -f kops-group.yaml

With these simple manifests applied, Cerebral is now ready to autoscale the Kubernetes cluster once the CPU allocation breaches the threshold.

Cluster Autoscaling Demo Using Cerebral

Following exactly what we did before in the Kubernetes Cluster Autoscaler demo, let's make a request to resource-consumer to cause it to consume more CPU to simulate the beginning of a spike in demand:

curl --data "millicores=60&durationSec=600" $RESOURCE_CONSUMER_ADDRESS/ConsumeCPU

A second replica will appear and start running after a few moments, as expected:

As before, this increases the total CPU requests on the worker node pool to above 80%. Cerebral notices this and, after the short sampling period we configured in the AutoscalingPolicy, triggers the scale up event. We can see this in the Cerebral logs:

You'll also notice in the logs that any subsequent scale requests are ignored by the engine because the autoscaling group has entered the cool down period that we defined. This gives time for the new node to come up and helps avoid thrashing scale up and down requests against the cloud provider.

In short time, the new node is spun up by AWS and is available for scheduling:

kubectl get events --field-selector=involvedObject.name=ip-172-20-37-156.ec2.internal -o=custom-columns=FIRST_TIMESTAMP:.firstTimestamp,REASON:.reason,MESSAGE:.message

As you can see, we did not have to wait for any pods to get stuck in the Pending state before the scale up was triggered. As soon as we breached the defined CPU resource request threshold, a scale up event was triggered. In other words, Cerebral preemptively spun up another node in order to meet the incoming demand.

Cooling Down

Scaling down to save costs after the increased demand diminishes is just as easy.
Simply edit the AutoscalingPolicy to add a scale down policy that triggers once the resource requests go below a defined threshold.

Key Takeaways

Both the Kubernetes Cluster Autoscaler (CA) and Cerebral were able to successfully scale up (add a node to) the cluster when load was increased in our test scenario. Cerebral was able to use a user-defined CPU threshold to preemptively scale up when it saw that the load was increasing. The Kubernetes CA, however, had to wait until a pod could not be scheduled.

This is only one very specific example of an advantage of using Cerebral over the Kubernetes CA. The real power of Cerebral lies in its flexibility and extensibility. For example, in the previous experiment, if the load had increased to its highest point instantaneously, Cerebral would not have been able to preemptively scale and the results would have been very similar between the two autoscalers. However, unlike the Kubernetes CA, Cerebral could have been configured to use a custom metric/event backend to trigger scaling on a different indicator (e.g. a marketing announcement being published, an application queue depth hitting a certain threshold, etc).

With Cerebral, operators have the power to configure autoscaling in a way that makes sense for their system(s). They also have the ability to trivially move this functionality to a different cloud provider because of the CRD-based approach.

Closing Remarks

We hope you found this introduction to autoscaling on Kubernetes helpful.
If you have any questions or comments, please don't hesitate to reach out using the contact info below. We're also always excited to hear feedback from our users, and GitHub issues and PRs are more than welcome!

Autoscaling on Containership

Psst...just want to get up and running with preemptive autoscaling on AWS, Azure, DigitalOcean, Google, or Packet? Don't want to fuss with any YAML? There's a great UI for that - with beautiful graphs, too! 🚀

Contact

Footnotes

†: There is a workaround for this that involves filling nodes with pause pods, but it's quite complex.

††: There are other ways to provide metrics to the Metrics API for use with HPAs, such as the Prometheus Adapter.

Show Comments

Get the latest posts delivered right to your inbox.