VMSS Scale-In and Application Gateway Connection Draining – How to Gracefully Handle Requests?

pipi 25 Reputation points
2025-11-11T04:06:10.0333333+00:00

We are currently using VMSS along with Application Gateway to handle user web requests. Our question is regarding scale-in behavior: even with connection draining enabled, it seems ineffective because the VMSS instance is removed immediately during scale-in, causing ongoing requests to be abruptly terminated. Connection draining does not have a chance to take effect.

What is the recommended way to configure auto scale-out/in for VMSS so that Application Gateway’s connection draining can be effectively utilized? Or is there a better approach to ensure graceful shutdown of instances during scale-in?

Azure Virtual Machine Scale Sets
Azure Virtual Machine Scale Sets
Azure compute resources that are used to create and manage groups of heterogeneous load-balanced virtual machines.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Jilakara Hemalatha 5,970 Reputation points Microsoft External Staff Moderator
    2025-11-11T20:13:01.7366667+00:00

    Hi pipi,

    Thanks for reaching out to the Q/A. VM Scale Set (VMSS) does not wait for Application Gateway’s connection draining to complete during scale-in. Although connection draining stops new incoming requests, VMSS terminates the instance immediately, resulting in active requests being dropped before completion.

    This happens due to Application Gateway marks the backend instance as draining and stops sending new requests and VMSS does not wait for draining to finish and deletes the VM before existing sessions are completed.

    Below are the recommendations

    1.Enable VM Termination Notification:

    This allows the VM to receive a shutdown signal (typically 30 seconds’ notice), so the application can stop accepting new requests and complete in-flight transactions.

    Reference: https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-terminate-notification

    2.Use VMSS Instance Protection:

    This Prevents critical instances from being deleted during scale-in until it is safe to remove them.

    Reference: https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-instance-protection

    3.Verify Application Gateway Connection Draining Configuration:

    Ensures new requests are stopped while allowing existing connections to finish.

    Reference: https://learn.microsoft.com/en-us/azure/application-gateway/configuration-http-settings?tabs=backendhttpsettings

    Additional Best Practices:

    • Review and adjust scale-in policies (Newest VM, Oldest VM, or Least CPU usage) based on workload behavior
    • Configure appropriate autoscale thresholds and cooldown periods to avoid aggressive scale-in
    • Monitor scale events using Azure Monitor to validate expected behavior
    • Consider leveraging Availability Zones to improve resiliency during scale operations

    Reference: https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-scale-in-policy

    Hope this helps! Please let me know if you have any queries.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.