Intermittent Azure App Service (Linux) IDX20803 on Specific Instances

Nanthakumar S 0 Reputation points
2025-12-05T15:03:05.8333333+00:00

I’ve been troubleshooting an intermittent issue where some instances of an Azure App Service (Linux) fail to communicate with Azure AD B2C. The app throws this error on one or a few instances, while others work fine:

 

[{"ClassName":"System.InvalidOperationException","Message":"IDX20803: Unable to obtain configuration from: 'https://tenant.b2clogin.com/tenant.onmicrosoft.com/v2.0/.well-known/openid-configuration?p=B2C_1A_CLIENTCREDENTIALS'. Will retry at '11/17/2025 13:33:19 +00:00'. Exception: 'System.IO.IOException: IDX20804: Unable to retrieve document from: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]'.\n ---> System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.\n ---> System.TimeoutException: A task was canceled.\n ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.\n   at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)\n   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)\n   at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)\n   --- End of inner exception stack trace ---\n   --- End of inner exception stack trace ---\n   at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)\n   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)\n   at Microsoft.IdentityModel.Protocols.HttpDocumentRetriever.SendAsyncAndRetryOnNetworkError(HttpClient httpClient, Uri uri)\n   at Microsoft.IdentityModel.Protocols.HttpDocumentRetriever.GetDocumentAsync(String address, CancellationToken cancel)\n   --- End of inner exception stack trace ---\n   at Microsoft.IdentityModel.Protocols.HttpDocumentRetriever.GetDocumentAsync(String address, CancellationToken cancel)\n   at Microsoft.IdentityModel.Protocols.OpenIdConnect.OpenIdConnectConfigurationRetriever.GetAsync(String address, IDocumentRetriever retriever, CancellationToken cancel)\n   at Microsoft.IdentityModel.Protocols.ConfigurationManager`1.GetConfigurationAsync(CancellationToken cancel)'.","Data":null,"InnerException":{"ClassName":"System.IO.IOException","Message":"IDX20804: Unable to retrieve document from: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]'.","Data":null,"InnerException":{"ClassName":"System.Threading.Tasks.TaskCanceledException","Message":"The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.","Data":null,"InnerException":{"ClassName":"System.TimeoutException","Message":"A task was canceled.","Data":null,"InnerException":{"ClassName":"System.Threading.Tasks.TaskCanceledException","Message":"A task was canceled.","Data":null,"InnerException":null,"HelpURL":null,"StackTraceString":"   at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)\n   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)\n   at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)","RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146233029,"Source":"System.Private.CoreLib","WatsonBuckets":null},"HelpURL":null,"StackTraceString":null,"RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146233083,"Source":null,"WatsonBuckets":null},"HelpURL":null,"StackTraceString":"   at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)\n   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)\n   at Microsoft.IdentityModel.Protocols.HttpDocumentRetriever.SendAsyncAndRetryOnNetworkError(HttpClient httpClient, Uri uri)\n   at Microsoft.IdentityModel.Protocols.HttpDocumentRetriever.GetDocumentAsync(String address, CancellationToken cancel)","RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146233029,"Source":"System.Net.Http","WatsonBuckets":null},"HelpURL":null,"StackTraceString":"   at Microsoft.IdentityModel.Protocols.HttpDocumentRetriever.GetDocumentAsync(String address, CancellationToken cancel)\n   at Microsoft.IdentityModel.Protocols.OpenIdConnect.OpenIdConnectConfigurationRetriever.GetAsync(String address, IDocumentRetriever retriever, CancellationToken cancel)\n   at Microsoft.IdentityModel.Protocols.ConfigurationManager`1.GetConfigurationAsync(CancellationToken cancel)","RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146232800,"Source":"Microsoft.IdentityModel.Protocols","WatsonBuckets":null},"HelpURL":null,"StackTraceString":"   at Microsoft.IdentityModel.Protocols.ConfigurationManager`1.GetConfigurationAsync(CancellationToken cancel)\n   at Microsoft.IdentityModel.Protocols.ConfigurationManager`1.GetConfigurationAsync()","RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146233079,"Source":"Microsoft.IdentityModel.Protocols","WatsonBuckets":null}] 

It happens both with autoscaling and with a fixed instance count

Only certain instances have the issue

  • Restarting just the failing instance/ restarting the app service immediately fixes the problem
  • vNet route all is true and Application routing Outbound internet traffic = On.
  • There is NO NAT gateway.
Azure App Service
Azure App Service
Azure App Service is a service used to create and deploy scalable, mission-critical web apps.
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Q&A Assist
    2025-12-05T15:03:23.99+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The intermittent issue you are experiencing with specific instances of your Azure App Service (Linux) failing to communicate with Azure AD B2C, resulting in the IDX20803 error, could be related to transient faults or network configuration issues. Here are some steps you can take to troubleshoot and potentially resolve the issue:

    1. Transient Fault Handling: Since the error resolves upon restarting the failing instance, it may indicate a transient fault. Ensure that your application is designed to handle such faults by implementing retry logic for requests that fail due to transient issues. This is a common practice in cloud environments where intermittent failures can occur.
    2. Instance Health: Monitor the health of the instances in your App Service. If certain instances are consistently failing while others are not, consider deploying multiple instances to improve resilience. Azure App Service can automatically replace unhealthy instances, which may help mitigate the issue.
    3. Network Configuration: Verify your network configuration settings. Since you mentioned that vNet routing is enabled and outbound internet traffic is set to "On," ensure that there are no firewall rules or network security groups that might be blocking traffic from specific instances.
    4. Scaling Considerations: If you are using autoscaling, ensure that the scaling rules are not causing instances to be overloaded or under-resourced, which could lead to connectivity issues. If using a fixed instance count, ensure that the selected tier and instance size meet your performance requirements.
    5. Deployment Slots: If possible, use deployment slots for zero-downtime deployments. This can help minimize the impact of any deployments or configuration changes that might affect instance availability.
    6. Logs and Monitoring: Enable detailed logging and monitoring on your Azure App Service to capture more information about the failures. This can provide insights into whether the issue is related to specific requests or external dependencies.

    By following these steps, you can better understand and potentially resolve the intermittent communication issues with Azure AD B2C on your Azure App Service instances.


    References:

    0 comments No comments

  2. RAMAMURTHY MAKARAPU 1,125 Reputation points Microsoft External Staff Moderator
    2025-12-06T01:57:14.3033333+00:00

    Hi @Nanthakumar S ,

    Thank you for submitting your question on Microsoft Q&A.

    Hey Nanthakumar, it sounds like you're experiencing an intermittent issue with your Azure App Service (Linux) communicating with Azure AD B2C, leading to timeouts and an exception error (IDX20803). Here are some steps and considerations that might help you resolve this issue:

    1. Check Instance Health: Since the issue affects only certain instances, consider checking the health of those specific instances. Using the Azure Portal, you can look at the logs for those instances to see if there are any errors or performance issues at the time the error occurs.
    2. Increase Timeout Settings: The error message indicates that the request was canceled due to a timeout of 100 seconds. If possible, consider increasing this timeout value to allow more time for the open ID configuration to be retrieved. This can be done through the application settings.
    3. Enable Application Logging: You might want to enable application logging to get more detailed logs which could provide more context on why some requests are timing out. You can review these logs through the Azure Portal.
    4. Health Checks and Auto-Heal: Implementing Health Checks can help remove failing instances from the load balancer if they return failure responses. Also, consider setting up Auto-Heal if your application enters an unrecoverable state, as this can automatically restart the worker process for the failed instances.
    5. Diagnose Network Issues: Since issues might occur intermittently, it could be related to network connections or exhaustion of SNAT ports. Make sure to verify your outbound connections and refer to the guide on troubleshooting intermittent outbound connection errors.
    6. Review VNet Configuration: You mentioned that your vNet routing is set to "all true". Ensure that there are no misalignments in your vNet configuration that might block access to Azure AD B2C from those specific instances.
    7. Testing in Development Environment: If applicable, recreate the issue in a test environment to better understand if certain configurations lead to these errors.
    8. Update Application Dependencies: Ensure that all libraries and dependencies used for Azure AD B2C integration in your app are up-to-date, as outdated libraries may have known issues that could contribute to this problem.

    Follow-Up Questions:

    1. Can you confirm if the issue is isolated to specific instances or if there are specific times when instances fail consistently?
    2. Have you checked the logs for those specific instances at the time of failure to see if there are any additional error messages?
    3. What version of the Azure App Service are you using, and are there any updates or recent changes made to your configuration or code before the issue started?
    4. Are there any custom configurations or settings in your Azure AD B2C that may impact the authentication process?
    5. Would you like guidance on how to implement the Health Check and Auto-Heal features?

    I hope these suggestions help you get closer to solving the problem! If you need more details or further assistance, feel free to ask.

    References:

    https://learn.microsoft.com/en-us/azure/app-service/overview#troubleshooting

    https://learn.microsoft.com/en-us/troubleshoot/azure/app-service/troubleshoot-instance-related-issues-on-azure-app-service?wt.mc_id=knowledgesearch_inproduct_azure-cxp-community-insider

    0 comments No comments

  3. Nanthakumar S 0 Reputation points
    2025-12-06T03:16:39.0066667+00:00

    Hi @RAMAMURTHY MAKARAPU ,

    Here the response for the follow-up questions.

    1. Can you confirm if the issue is isolated to specific instances or if there are specific times when instances fail consistently? Yes — the issue is isolated to an instance where the other instances serve the requests correctly in the App service linux container. It doesn’t follow a time pattern, and it happens both with and without autoscaling. When a new instance comes online, occasionally that instance is unable to reach the Azure AD B2C metadata endpoint. During that period the health check is fine. Restarting ONLY the affected instance immediately resolves the issue, while other instances continue to function normally.
    2. Have you checked the logs for those specific instances at the time of failure to see if there are any additional error messages? Yes — the logs consistently show the same error for all requests go to the that instance. IDX20803 — unable to retrieve OpenID configuration. We do not see any additional internal exceptions beyond failed attempts to reach the B2C discovery endpoint. There are no application-level issues, and everything points to an intermittent loss of outbound connectivity on the specific faulty instance.
    3. What version of the Azure App Service are you using, and are there any updates or recent changes made to your configuration or code before the issue started? The issue occurs on App Service (Linux) custom container. mcr.microsoft.com/dotnet/aspnet:8.0 and some of other app service uses mcr.microsoft.com/dotnet/aspnet:8.0-alpine3.18-amd64 image. There were no code changes that correlate with the problem. Note: This issue is NOT specific to a particular app service. This happens to couple of services that is running in App service linux container.
    4. Are there any custom configurations or settings in your Azure AD B2C that may impact the authentication process? Nothing unusual. The Azure AD B2C policies and endpoints work correctly from other instances, from local development, and from test environments. Since only certain App Service instances fail to reach.
    5. Would you like guidance on how to implement the Health Check and Auto-Heal features? Health check is configured already. Auto-Heal needs to be configured but this might not solve the root cause of the problem.

    Appreciate your response.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.