Dynamic Kubernetes Scaling Across Environments Without Worker Node or Multi-cluster Overhead
Extend Horizontal and Vertical Pod Autoscaler beyond the cluster
Let’s assume you have a heavily used Kubernetes workload deployed in your datacenter. You have an autoscaling setup for the workload based on some key metrics. This will ensure that the workload can scale-out and scale-in based on the key metrics.
Kubernetes provides a robust mechanism for scaling workloads.
Let’s take a look at different approaches for scaling Kubernetes workloads
- Horizontal Pod Autoscaling (HPA): Automatically adds or removes pod replicas as shown below.
- Vertical Pod Autoscaling (VPA): Automatically adds or adjusts CPU and memory reservations for your pods as shown below.
- Cluster Autoscaling: Automatically adds or removes nodes in a cluster based on all pods’ requested resources as shown below.
Now what if your cluster hosting (primary) environment no longer has the capacity to handle the increased load for your workload ?
You will have to provision the worker node into a different (target) environment with spare capacity. But provisioning a worker node in a different environment may not always be technically possible due to the way cluster autoscaling works.
Let’s say you are running an on-prem bare-metal cluster, you can’t use the cluster autoscaling to add a worker node in AWS and deploy the pod there. This is by design. Further, let’s say you got it to work by some means. Having a worker node in an environment which is different from where the Kubernetes control plane is hosted introduces complexity from an operations stand point. The network latencies between the environments need to be within acceptable limits for the Kubernetes control plane to manage the remote worker node(s) in the different environments.
Alternatively you may want to deploy a copy of the app in two different clusters hosted in different environments like shown below.
A disadvantage of the above approach is the need for a complete cluster in the target environment. Unless you have a business need for multiple clusters, just having a cluster to handle scaling of a specific workload may not be cost effective from an operations standpoint.
Would it be beneficial if the horizontal pod autoscaling (HPA) or the vertical pod autoscaling (VPA) can be targeted to different environments (other than the one hosting the K8s cluster) without worrying about managing worker nodes or separate clusters in those target environments ?
Something like the following which shows HPA and VPA across environments:
Let’s assume this is made possible. But are there any benefits to it?
Indeed there are some benefits :-). Here are some of them:
- As mentioned previously, standard Kubernetes autoscaling (HPA/VPA) operates within a single cluster and relies on cluster autoscaling to add worker nodes. However the approach described here removes the need for cluster-bound scaling, allowing workloads to scale dynamically across environments (e.g., on-prem to cloud).
- Instead of provisioning fixed capacity in advance, this approach enables on-demand scalability in the most cost-effective environment. Example: On-prem resources exhausted → New pods autoscaled to spot instances in AWS/GCP, reducing compute costs.
- Minimises operational overhead relative to multi-cluster deployments.
Enough of theoretical details. Let’s see how you can implement it with a simple example.
Implementation
Extending HPA/VPA to an environment other than the primary cluster environment is possible by combining the following technologies in Kubernetes.
- KEDA (Kubernetes Event-Driven Autoscaler) offers a flexible way to scale Kubernetes applications dynamically based on various metrics.
- Kata containers runtime offers pod sandboxing technology and supports creating sandboxed pods across different environments. It can create sandboxed pods using a local hypervisor like KVM/Qemu. It can also create sandboxed pods using a remote hypervisor (also referred to as peer-pods). Kata uses remote APIs such as AWS EC2 and Azure VM APIs to manage the remote hypervisor. You can use the Kata remote hypervisor feature to create pods in different environments without needing a worker node or a cluster in those environments. The environment hosting the cluster still manages the pod.
- Kubernetes RuntimeClass provides a way to specify a different container runtime or a configuration for the pod. Typically, you expose a non-default container runtime like Kata containers as a RuntimeClass, so the app can select whether to use Kata containers runtime for the pod or use the default configured runtime (typically runc).
I’ll use the following basic workload to demonstrate the implementation.
Also, let’s assume you have set up KEDA and Kata with the remote hypervisor (peer pods) for a target environment (e.g., AWS). There are enough materials available to help you with the KEDA and Kata remote hypervisor setup. If you face any issues, please get in touch with me or comment on this blog with your issue.
apiVersion: apps/v1
kind: Deployment
metadata:
name: on-prem
spec:
replicas: 1
selector:
matchLabels:
app: my-web-app
template:
metadata:
labels:
app: my-web-app
spec:
containers:
- name: my-web-app
image: ghcr.io/mendhak/http-https-echo:34
ports:
- containerPort: 8080
The deployment mentioned above (on-prem) runs on-premises using the default container runtime. It uses the selector “app: my-web-app”. Also, note that the spec doesn’t contain any runtimeClass attribute.
Step-1
Create a Kubernetes service with the label selector used for the workload (“app: my-web-app”). We’ll use this label selector to select different backends across different runtimeClasses for the same service:
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
type: ClusterIP
selector:
app: my-web-app
ports:
- port: 80
targetPort: 8080
Step-2
Create a deployment targeting the remote environment and set the number of replicas to zero.
Shown below is the example deployment (remote):
apiVersion: apps/v1
kind: Deployment
metadata:
name: remote
spec:
replicas: 0
selector:
matchLabels:
app: my-web-app
template:
metadata:
labels:
app: my-web-app
environment: cloud
spec:
runtimeClassName: kata-remote
containers:
- name: my-web-app
image: ghcr.io/mendhak/http-https-echo:34
ports:
- containerPort: 8080
Step-3
Configure KEDA such that it monitors the on-prem deployment and triggers scaling in the remote deployment when additional capacity is required:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: my-web-app-scaledobject
spec:
scaleTargetRef:
name: remote
minReplicaCount: 0
maxReplicaCount: 4
pollingInterval: 5
cooldownPeriod: 5
advanced:
restoreToOriginalReplicaCount: true
triggers:
- type: kubernetes-workload
metadata:
podSelector: 'app=my-web-app, environment notin (cloud)'
value: '3'
activationValue: '2'
Note the scaleTargetRef attribute. It’s set to remote. Also note the triggers. KEDA specifically monitors only the pods that belongs to on-prem deployment (podSelector: ‘app=my-web-app, environment notin (cloud)’).
This completes the setup.
Testing
To observe horizontal pod autoscaling across the two environments (on-prem and remote), you can scale out the number of replicas of the on-prem deployment. In practice, you’ll use some resource usage metrics (e.g. CPU or memory usage) for automatic scaling.
kubectl scale --replicas=3 deployment/on-prem
You’ll see that Kubernetes creates a pod for the remote deployment, and the number of replicas increases to 1. The number of replicas for the on-prem deployment increases from 1 to 2.
Conversely, when the number of replicas for the on-prem deployment decreases, KEDA automatically scales down the replicas in the remote deployment back to zero. You can observe this by manually scaling down the number of replicas for the on-prem deployment.
kubectl scale --replicas=2 deployment/on-prem
And that’s about it.
We have used this approach for cloud bursting from on-prem, which you can read here — https://www.redhat.com/en/blog/secure-cloud-bursting-leveraging-confidential-computing-peace-mind
I hope this is useful. If you have a unique combination of Kubernetes features that solves your problem and you can share it, I would love to hear about it.