AKS evicted pods and restarted node while it appears that the node has sufficient cpu and memory resources

Question

AKS evicted pods and restarted node while it appears that the node has sufficient cpu and memory resources

Ming Yu 70

We have an AKS running 1.32.9 with two nodes.

Each node is configured with max. 70 pods.

We noticed that, from time to time, some of our deployments were evicted and one of the nodes could have been restarted when the number of max pods on the node reached 70.

All of those affected deployments use horizontal pod auto scaler and can scale out additional pods depending on the load.

My understanding is that, when a node is maxed out in its pod's threshold, the AKS should scale out additional node instead of evicting pods and restarting the node.

Since we did not enable pod disruption budget on those deployments, the node restart could create a small time window during which the deployments' replica sets has 0 pods. This creates a small windowed outage.

We did an experiment by increasing the minimal number of nodes from 2 to 3, which appears to have resolved the issue.

But having 3 nodes in the cluster seems to have created a waste of resources since each node's cpu and memory is now significantly underutilized (see screenshot):

User's image

For optimal resource usage, we thin both CPU and memory usage should be approx. 80% for all nodes.

Note: even when the cluster had only two nodes, both CPU and memory usage did not exceed 80%.

We need help from experts on why the AKS did not scale out additional node when existing node's pod capacity is maxed out.

Thanks.

1 answer

Your answer

Answer 1

Jilakara Hemalatha 6,770 Microsoft External Staff Moderator

Hello Ming Yu

Thank you for reaching out and for sharing the detailed scenario along with the results of your testing.

The behavior you observed is expected in AKS and is not related to CPU or memory utilization, but rather to pod density limits on the node. In AKS, the maxPods value configured on a node pool is a hard scheduling limit. Once a node reaches this limit (70 pods in your case), Kubernetes cannot schedule additional pods on that node, even if sufficient CPU and memory are still available.

AKS uses Cluster Autoscaler to add new nodes, but it only scales out when it detects unschedulable pods in a Pending state. In this scenario, pods were evicted shortly after the pod limit was reached and did not remain Pending long enough for Cluster Autoscaler to trigger a scale-out. As a result, a new node was not added automatically when the pod limit was exhausted.

The brief service interruption occurred because Pod Disruption Budgets (PDBs) were not configured. Without a PDB, Kubernetes allows all replicas of a deployment to be evicted during a node restart, which can temporarily result in zero running pods.

When you increased the minimum node count from two to three, the issue was resolved because additional scheduling headroom was available. However, CPU and memory appear underutilized because pod count, not compute capacity, was the limiting factor.

Recommended Actions

Increase the maxPods value on the node pool (for example, 110 or higher), provided your subnet has sufficient IP capacity or by using Azure CNI Overlay.

Ensure Cluster Autoscaler is enabled and properly configured.

Configure Pod Disruption Budgets for all critical workloads to prevent zero-replica outages.

Monitor pod density per node in addition to CPU and memory usage.

These changes will help prevent pod evictions, avoid service interruptions, and allow better utilization of available resources.

Please find below documentations for reference:

Configure Azure CNI networking in Azure Kubernetes Service (AKS)

Use the cluster autoscaler in Azure Kubernetes Service (AKS)

Manually scale the node count in an Azure Kubernetes Service (AKS) cluster

Pod Disruption Budgets

Below thread for similar issue:

https://github.com/Azure/AKS/issues/2665

https://github.com/Azure/azure-cli/issues/13420

Hope this helps! Please let me know if you have any queries.

Ming Yu 70 Reputation points

2025-12-17T19:39:52.52+00:00

Thanks for the detailed explanation.

Just to clarify your response "...In this scenario, pods were evicted shortly after the pod limit was reached and did not remain Pending long enough for Cluster Autoscaler to trigger a scale-out. As a result, a new node was not added automatically when the pod limit was exhausted.".

My understanding is that, in order for AKS cluster auto scaler to scale out additional node, the unschedulable pods need remain in pending state long enough in order for AKS cluster auto scaler to have adequate time to scale out a new node.

Based on our observation, it can take somewhere between 5 to 10 minutes for a new node to be scaled out and fully ready to accept new pod deployment.

But AKS control plane just decided to evict some existing pods before the cluster autoscaler had a chance to scale out a new node.

Is my understanding correct?

If yes, then no matter what number (be it 70 or 110) we set for the max pods per node in the cluster, whenever that number is reached, the same symptom should happen regardlessly.

If that's the case, what can we do to make sure that, when a node's max pod capacity is reached, instead of evicting existing pods, scale out an additional node with more capacity.

By the way, the cluster is enabled with node pool autoscaling.

Thanks, and look forward to your follow up.
Ming Yu 70 Reputation points

2025-12-18T13:43:20.12+00:00

Just went through K8's official documentation on node autoscaling: https://kubernetes.io/docs/concepts/cluster-administration/node-autoscaling/. I don't see any mentioning of behavior "In this scenario, pods were evicted shortly after the pod limit was reached and did not remain Pending long enough for Cluster Autoscaler to trigger a scale-out"; instead, it says very clearly that "...If the application load increases, the average utilization of its Pods should also increase, prompting workload (horizontal) autoscaling to create new Pods. Node autoscaling should then provision new Nodes to accommodate the new Pods.".
Jilakara Hemalatha 6,770 Reputation points Microsoft External Staff Moderator

2025-12-18T20:46:55.53+00:00

Thank you for reviewing the Kubernetes documentation and calling this out. To clarify, Cluster Autoscaler itself does not evict pods; it only reacts to pods that remain in a Pending (unschedulable) state. The pod evictions observed in this scenario were due to Kubernetes scheduling and disruption behavior not due to node autoscaling logic.

Also, could you please check private message and provide necessary details for further investigation
Ming Yu 70 Reputation points

2025-12-19T14:44:29.5833333+00:00
Well! After deeper research, what actually happened is follows:

For cost saving reason, our services equipped with HPA (with minReplica size 1) are not configured with PDB (Pod Disruption Budget).

The AKS node auto scaler did not scale out additional node when there are more pods created by services' HPA, exactly the way described in https://kubernetes.io/docs/concepts/cluster-administration/node-autoscaling/.

However, when demand of services went down back to normal. HPAs reduced the number of pods, which triggered the node auto scaler to "consolidate" nodes per document.

The node consolidation process involves following steps:

Pick a node as the victim to reduce/sacrifice.

Move all pods running on the to-be-scarified node to other nodes.

Once the to-be-sacrificed node is empty, kill it.

However, "moving" pods from one node to other nodes involving deleting the pod from the to-be-sacrified node and creating a new pod on other nodes.

W/O PDB, it is possible that a service's replicaset could have 0 pods during the "moving" process.

Assume the cluster had node A and node B initially; late, node C was added to accommodate additional pods.

After the spike went over, the node auto scaler will need kill a node to shrink the node pool's capacity accordingly; assume node A was chosen to sacrifice, then it seems that node A was restarted and replaced by node C after the whole node consolidation process is done.

So, we just need provision PDBs to make sure that our services will have at least 1 pod undisrupted during the node auto scaler resizing (either up or down) processes.

Anyway, thanks for looking into the post.
Jilakara Hemalatha 6,770 Reputation points Microsoft External Staff Moderator

2025-12-20T00:23:51.9933333+00:00

Thank you for the detailed update and confirmation.

Share via

AKS evicted pods and restarted node while it appears that the node has sufficient cpu and memory resources

1 answer

Your answer