To address the issue with your Azure Kubernetes Service (AKS) cluster being in a degraded state and the node pools (newpool1, newpool2) showing provisioningState: Failed, you can follow these troubleshooting steps:
- Check Node Pool Status: Use the command
az aks nodepool showto check the status of your node pools. Look for any specific error messages or codes that can provide more insight into the failure. - VM Scale Set Status: Check the status of the VM scale set backing your node pools with
az vmss show. Again, look for any error messages or codes. - Inspect Individual VMs: Use
az vmss list-instancesto check the status of individual VMs in the node pools. If any VMs are in aFailedorUnhealthystate, investigate the error messages. - Quota and Capacity Check: Verify the quota and capacity for your region and subscription using
az vm list-usage. If you have reached limits, consider requesting an increase or deleting unused resources. - Policy and Role Assignments: Check the policy and role assignments for your node pools using
az policyandaz role. Ensure that there are no restrictions preventing the creation or scaling of nodes. - Resource Locks: Check for any resource locks on your node pools using
az lock. If there are locks that prevent scaling or updates, you may need to adjust or remove them. - Node Image Version: The error
InvalidGalleryImageRefsuggests there may be an issue with the specified node image version. Ensure that the image versionAKSUbuntu-1804gen2containerd-202505.27.0is valid and available in your region. You might need to update to a newer image version if it is deprecated or unavailable. - Reconciliation: If necessary, trigger a reconciliation for the managed cluster using the command
az resource update --ids <AKS cluster id>. This can help align the actual state with the desired configuration.
If these steps do not resolve the issue, consider reaching out to Azure support for further assistance, especially since this is a production outage.
References: