Azure Kubernetes Service (AKS)

Tigran 0 Reputation points
2025-12-13T23:43:44.2566667+00:00

Service

Azure Kubernetes Service (AKS)

Severity

Sev A / Production outage

Region

West Europe

Problem description (paste exactly):

Our AKS cluster is in a degraded state. Both system node pools (newpool1, newpool2) are in provisioningState: Failed. All nodepool operations (scale, add, reconcile) return InternalOperationError. Attempting to add a new user node pool fails with: InvalidGalleryImageRef – Gallery image reference invalid. Current nodeImageVersion is: AKSUbuntu-1804gen2containerd-202505.27.0 Existing nodes are running, but Azure cannot create or reconcile any new nodes. This blocks scheduling of stateful workloads (PostgreSQL Pending with volume node affinity conflict).

Request: Backend repair / node image reconciliation without data loss. 

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
{count} votes

2 answers

Sort by: Most helpful
  1. Q&A Assist
    2025-12-13T23:44:18.56+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    To address the issue with your Azure Kubernetes Service (AKS) cluster being in a degraded state and the node pools (newpool1, newpool2) showing provisioningState: Failed, you can follow these troubleshooting steps:

    1. Check Node Pool Status: Use the command az aks nodepool show to check the status of your node pools. Look for any specific error messages or codes that can provide more insight into the failure.
    2. VM Scale Set Status: Check the status of the VM scale set backing your node pools with az vmss show. Again, look for any error messages or codes.
    3. Inspect Individual VMs: Use az vmss list-instances to check the status of individual VMs in the node pools. If any VMs are in a Failed or Unhealthy state, investigate the error messages.
    4. Quota and Capacity Check: Verify the quota and capacity for your region and subscription using az vm list-usage. If you have reached limits, consider requesting an increase or deleting unused resources.
    5. Policy and Role Assignments: Check the policy and role assignments for your node pools using az policy and az role. Ensure that there are no restrictions preventing the creation or scaling of nodes.
    6. Resource Locks: Check for any resource locks on your node pools using az lock. If there are locks that prevent scaling or updates, you may need to adjust or remove them.
    7. Node Image Version: The error InvalidGalleryImageRef suggests there may be an issue with the specified node image version. Ensure that the image version AKSUbuntu-1804gen2containerd-202505.27.0 is valid and available in your region. You might need to update to a newer image version if it is deprecated or unavailable.
    8. Reconciliation: If necessary, trigger a reconciliation for the managed cluster using the command az resource update --ids <AKS cluster id>. This can help align the actual state with the desired configuration.

    If these steps do not resolve the issue, consider reaching out to Azure support for further assistance, especially since this is a production outage.


    References:

    0 comments No comments

  2. anrodrigues-MSFT 0 Reputation points Microsoft Employee
    2025-12-15T10:17:53.7566667+00:00

    Hi @Tigran

    Please note that images of aks ubuntu 1804 have been retired and no new nodes and images are available.

    More details here. https://github.com/Azure/AKS/issues/4873

    We recommend you to upgrade to Ubuntu 22.04.

    If you have any questions, please drop a comment by tagging my userid @anrodrigues-MSFT

    If this does answer your question, please accept it as the answer as a token of appreciation.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.