Hello @lili zhang ,The logs shared with the command kubectl logs -p ** -c ** -n ** explains that the delivery service is failing because it's unable to acquire an Azure AD token for key vault using Managed Identity passed for Helm earlier. This is likely because of 2 below reasons:
- The user-assigned managed identity does not have Key Vault permissions.
or - The AzureIdentity / AzureIdentityBinding (AAD Pod Identity) or Workload Identity setup is incomplete.
Kindly follow below steps to verify and fix the issue:
- Verify if the managed identity created for Delivery which you passed for Helm setup exists?
(search in the Azure Portal for Managed Identities and verify if that Managed Identity for Delivery
actually exist)
basically, the clientId and resourceId of Delivery MI should match to what you passed in Helm (identity.clientid and identity.resourceid).
- If that Delivery managed identity (MI) exists then see
that the AKS cluster’s service principal is having Managed Identity Operator role on it.
You can assign it as well using below Azure CLI command:
az role assignment create \
--role "Managed Identity Operator" \
--assignee <AKS-Cluster-Principal-ID> \
--scope <Delivery-Identity-Resource-ID>
- If both are checked and validated, then verify that managed identity controller (MIC) and node managed identity (NMI) pods are running using below command:
then check CRDs,kubectl get pods -n kube-system | grep aad-pod-identity
from this you should see output -> AzureIdentity for Delivery managed identity with correct clientID and resourceID. -> AzureIdentityBinding with selector matching pod label aadpodidbinding=delivery-v0.1.0-dev. If it's mis-matched, fix Helm values using helm upgrade command, sample command:kubectl get AzureIdentity,AzureIdentityBinding -n backend-devhelm upgrade delivery-v0.1.0-dev delivery-v0.1.0.tgz \ --set identity.clientid=<clientId> \ --set identity.resourceid=<resourceId> \ ... - Check if the Keyvault has allowed the access to the Delivery managed identity.
If it's already been given the access to keyvault then review if the keyvault is having any networking restrictions (it's not publicly available). If yes, then add AKS subnet to the allowed network list.
- After fixing and validating above steps (Keyvault and Identity), restart the pod (can be trigged with below delete pod command)
As soon as the restart triggers, review logs:kubectl delete pod -n backend-dev -l app.kubernetes.io/instance=delivery-v0.1.0-dev
You should see successful secret load and the application listening at 8080.kubectl logs -f <pod-name> -n backend-dev - In the end, you can run your initial command maybe with 120s/150s timeout value to validate everything going well.
kubectl wait --namespace backend-dev \ --for=condition=ready pod \ --selector=app.kubernetes.io/instance=delivery-v0.1.0-dev \ --timeout=120s
References: https://learn.microsoft.com/en-us/azure/aks/use-azure-ad-pod-identity#operation-mode-options