Hello Ming Yu
Thank you for reaching out and for sharing the detailed scenario along with the results of your testing.
The behavior you observed is expected in AKS and is not related to CPU or memory utilization, but rather to pod density limits on the node. In AKS, the maxPods value configured on a node pool is a hard scheduling limit. Once a node reaches this limit (70 pods in your case), Kubernetes cannot schedule additional pods on that node, even if sufficient CPU and memory are still available.
AKS uses Cluster Autoscaler to add new nodes, but it only scales out when it detects unschedulable pods in a Pending state. In this scenario, pods were evicted shortly after the pod limit was reached and did not remain Pending long enough for Cluster Autoscaler to trigger a scale-out. As a result, a new node was not added automatically when the pod limit was exhausted.
The brief service interruption occurred because Pod Disruption Budgets (PDBs) were not configured. Without a PDB, Kubernetes allows all replicas of a deployment to be evicted during a node restart, which can temporarily result in zero running pods.
When you increased the minimum node count from two to three, the issue was resolved because additional scheduling headroom was available. However, CPU and memory appear underutilized because pod count, not compute capacity, was the limiting factor.
Recommended Actions
Increase the maxPods value on the node pool (for example, 110 or higher), provided your subnet has sufficient IP capacity or by using Azure CNI Overlay.
Ensure Cluster Autoscaler is enabled and properly configured.
Configure Pod Disruption Budgets for all critical workloads to prevent zero-replica outages.
Monitor pod density per node in addition to CPU and memory usage.
These changes will help prevent pod evictions, avoid service interruptions, and allow better utilization of available resources.
Please find below documentations for reference:
Configure Azure CNI networking in Azure Kubernetes Service (AKS)
Use the cluster autoscaler in Azure Kubernetes Service (AKS)
Manually scale the node count in an Azure Kubernetes Service (AKS) cluster
Below thread for similar issue:
https://github.com/Azure/AKS/issues/2665
https://github.com/Azure/azure-cli/issues/13420
Hope this helps! Please let me know if you have any queries.