Model training on serverless compute

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

You don't need to create and manage compute to train your model in a scalable way. You can instead submit your job to a compute target type called serverless compute. Serverless compute is the easiest way to run training jobs on Azure Machine Learning. Serverless compute is a fully managed, on-demand compute. Azure Machine Learning creates, scales, and manages the compute for you. When you use serverless compute to train models, you can focus on building machine learning models and not have to learn about compute infrastructure or setting it up.

You can specify the resources the job needs. Azure Machine Learning manages the compute infrastructure and provides managed network isolation, reducing the burden on you.

Enterprises can also reduce costs by specifying optimal resources for each job. IT administrators can still apply control by specifying core quota at subscription and workspace levels and applying Azure policies.

You can use serverless compute to fine-tune models in the model catalog. You can use it to run all types of jobs by using Azure Machine Learning studio, the Python SDK, and Azure CLI. You can also use serverless compute to build environment images and for responsible AI dashboard scenarios. Serverless jobs consume the same quota as Azure Machine Learning compute quota. You can choose standard (dedicated) tier or spot (low-priority) VMs. Managed identity and user identity are supported for serverless jobs. The billing model is the same as the model for Azure Machine Learning compute.

Advantages of serverless compute

Azure Machine Learning manages creating, setting up, scaling, deleting, and patching compute infrastructure to reduce management overhead.
You don't need to learn about compute, various compute types, or related properties.
You don't need to repeatedly create clusters for each VM size that you need, using the same settings, and replicating for each workspace.
You can optimize costs by specifying the exact resources each job needs at runtime for instance type (VM size) and instance count. You can also monitor the utilization metrics of the job to optimize the resources a job needs.
Fewer steps are required to run a job.
To further simplify job submission, you can skip the resources altogether. Azure Machine Learning defaults the instance count and chooses an instance type by taking into account factors like quota, cost, performance, and disk size.
In some scenarios, wait times before jobs start running are reduced.
User identity and workspace user-assigned managed identity are supported for job submission.
With managed network isolation, you can streamline and automate your network isolation configuration. Customer virtual networks are also supported.
Administrative control is available via quota and Azure policies.

How to use serverless compute

When you create your own compute cluster, you use its name in the command job. For example, compute="cpu-cluster". With serverless, you can skip the creation of a compute cluster, and omit the compute parameter to instead use serverless compute. When compute isn't specified for a job, the job runs on serverless compute. Omit the compute name in your Azure CLI or Python SDK jobs to use serverless compute in the following job types, and optionally provide resources the job needs for instance count and instance type:
- Command jobs, including interactive jobs and distributed training
- AutoML jobs
- Sweep jobs
- Parallel jobs
For pipeline jobs via the Azure CLI, use default_compute: azureml:serverless for pipeline-level default compute. For pipeline jobs via the Python SDK, use default_compute="serverless". See Pipeline job for an example.
When you submit a training job in studio, select Serverless as the compute type.
When using Azure Machine Learning designer, select Serverless as the default compute.

Performance considerations

Serverless compute can increase the speed of your training in the following ways:

Avoid insufficient quota failures. When you create your own compute cluster, you're responsible for determining the VM size and node count. When your job runs, if you don't have sufficient quota for the cluster, the job fails. Serverless compute uses information about your quota to select an appropriate VM size by default.

Scale-down optimization. When a compute cluster is scaling down, a new job has to wait for the cluster to scale down and then scale up before the job can run. With serverless compute, you don't have to wait for scale down. Your job can start running on another cluster/node (assuming you have quota).

Cluster-busy optimization. When a job is running on a compute cluster and another job is submitted, your job is queued behind the currently running job. With serverless compute, your job can start running on another node/cluster (assuming you have quota).

Quota

When you submit a job, you still need sufficient Azure Machine Learning compute quota to proceed (both workspace-level and subscription-level quota). The default VM size for serverless jobs is selected based on this quota. If you specify your own VM size/family:

If you have some quota for your VM size/family but not sufficient quota for the number of instances, you see an error. The error recommends that you decrease the number of instances to a valid number based on your quota limit, request a quota increase for the VM family, or change the VM size.
If you don't have quota for your specified VM size, you see an error. The error recommends that you select a different VM size for which you do have quota or request quota for the VM family.
If you do have sufficient quota for a VM family to run the serverless job but other jobs are using the quota, you get a message stating that your job must wait in a queue until quota is available.

When you view your usage and quotas in the Azure portal, you see the name Serverless for all quota consumed by serverless jobs.

Identity support and credential passthrough

User credential passthrough: Serverless compute fully supports user credential passthrough. The user token of the user submitting the job is used for storage access. These credentials are from Microsoft Entra ID.

Serverless compute doesn't support system-assigned identity.

Python SDK
Azure CLI

from azure.ai.ml import command
from azure.ai.ml import MLClient     # Handle to the workspace.
from azure.identity import DefaultAzureCredential     # Authentication package.
from azure.ai.ml.entities import ResourceConfiguration
from azure.ai.ml.entities import UserIdentityConfiguration 

credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on ml.azure.com.
ml_client = MLClient(
    credential=credential,
    subscription_id="<Azure subscription ID>", 
    resource_group_name="<Azure resource group>",
    workspace_name="<Azure Machine Learning workspace>",
)
job = command(
    command="echo 'hello world'",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
        identity=UserIdentityConfiguration(),
)
# Submit the command job.
ml_client.create_or_update(job)

Create a file named hello.yaml that contains the following:

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest
identity:
  type: user_identity

Submit the job by using this command:

az ml job create --file hello.yaml --resource-group my-resource-group --workspace-name my-workspace

The rest of the Azure CLI examples show variations of the hello.yaml file. Run each of them in the same way.

User-assigned managed identity: When you have a workspace configured with user-assigned managed identity, you can use that identity with the serverless job for storage access. For information about accessing secrets, see Use authentication credential secrets in Azure Machine Learning jobs.

Verify your workspace identity configuration.

Python SDK
Azure CLI

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

subscription_id = "<your-subscription-id>"
resource_group = "<your-resource-group>"
workspace = "<your-workspace-name>"

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id,
    resource_group,
    workspace
)

# Get workspace details.
ws = ml_client.workspaces.get(name=workspace)
print(ws)

az ml workspace show --name <workspace-sname>  --resource-group <resource-group-name>

Look for the user-assigned identity in the output. If it's missing, create a new workspace with a user-assigned managed identity by following the instructions in Set up authentication between Azure Machine Learning and other services.

Use your user-assigned managed identity in your job.

Python SDK
Azure CLI

from azure.ai.ml import command
from azure.ai.ml import MLClient     # Handle to the workspace.
from azure.identity import DefaultAzureCredential    # Authentication package.
from azure.ai.ml.entities import ResourceConfiguration
from azure.ai.ml.entities import ManagedIdentityConfiguration

credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on ml.azure.com.
ml_client = MLClient(
    credential=credential,
    subscription_id="<Azure-subscription-ID>", 
    resource_group_name="<Azure-resource-group>",
    workspace_name="<Azure-Machine-Learning-workspace>",
)
job = command(
    command="echo 'hello world'",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
    identity= ManagedIdentityConfiguration(client_id="<workspace-UAMI-client-ID>"),
)
# Submit the command job.
ml_client.create_or_update(job)

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest
identity:
  type: managed

Configure properties for command jobs

If no compute target is specified for command, sweep, and AutoML jobs, the compute defaults to serverless compute. Here's an example:

Python SDK
Azure CLI

from azure.ai.ml import command 
from azure.ai.ml import MLClient # Handle to the workspace.
from azure.identity import DefaultAzureCredential # Authentication package.

credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on ml.azure.com.
ml_client = MLClient(
    credential=credential,
    subscription_id="<Azure-subscription-ID>", 
    resource_group_name="<Azure-resource-group>",
    workspace_name="<Azure-Machine-Learning-workspace>",
)
job = command(
    command="echo 'hello world'",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
)
# Submit the command job.
ml_client.create_or_update(job)

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest

The compute defaults to serverless compute with:

A single node, for this job. The default number of nodes is based on the type of job. See following sections for other job types.
A CPU virtual machine. The VM is determined based on quota, performance, cost, and disk size.
Dedicated virtual machines.
Workspace location.

You can override these defaults. If you want to specify the VM type or number of nodes for serverless compute, add resources to your job:

Use instance_type to choose a specific VM. Use this parameter if you want a specific CPU or GPU VM size

Use instance_count to specify the number of nodes.

Python SDK
Azure CLI

from azure.ai.ml import command 
from azure.ai.ml import MLClient # Handle to the workspace.
from azure.identity import DefaultAzureCredential # Authentication package.
from azure.ai.ml.entities import JobResourceConfiguration 

credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on ml.azure.com.
ml_client = MLClient(
    credential=credential,
    subscription_id="<Azure-subscription-ID>", 
    resource_group_name="<Azure-resource-group>",
    workspace_name="<Azure-Machine-Learning-workspace>",
)
job = command(
    command="echo 'hello world'",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
    resources = JobResourceConfiguration(instance_type="Standard_NC24", instance_count=4)
)
# Submit the command job.
ml_client.create_or_update(job)

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest
resources:
  instance_count: 4
  instance_type: Standard_NC24

To change the job tier, use queue_settings to choose between dedicated VMs (job_tier: Standard) and low priority VMs (job_tier: Spot).

Python SDK
Azure CLI

from azure.ai.ml import command
from azure.ai.ml import MLClient    # Handle to the workspace.
from azure.identity import DefaultAzureCredential    # Authentication package.
credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on ml.azure.com.
ml_client = MLClient(
    credential=credential,
    subscription_id="<Azure-subscription-ID>", 
    resource_group_name="<Azure-resource-group>",
    workspace_name="<Azure-Machine-Learning-workspace>",
)
job = command(
    command="echo 'hello world'",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
    queue_settings={
      "job_tier": "Spot"  
    }
)
# Submit the command job.
ml_client.create_or_update(job)

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
component: ./train.yml 
queue_settings:
   job_tier: Standard # Possible values are Standard (dedicated) and Spot (low priority). The default is Standard.

Example for all fields with command jobs

Here's an example that shows all fields specified, including the identity the job should use. You don't need to specify virtual network settings because workspace-level managed network isolation is automatically used.

Python SDK
Azure CLI

from azure.ai.ml import command
from azure.ai.ml import MLClient      # Handle to the workspace.
from azure.identity import DefaultAzureCredential     # Authentication package.
from azure.ai.ml.entities import ResourceConfiguration
from azure.ai.ml.entities import UserIdentityConfiguration 

credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on ml.azure.com.
ml_client = MLClient(
    credential=credential,
    subscription_id="<Azure-subscription-ID>", 
    resource_group_name="<Azure-resource-group>",
    workspace_name="<Azure-Machine-Learning-workspace>",
)
job = command(
    command="echo 'hello world'",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
         identity=UserIdentityConfiguration(),
    queue_settings={
      "job_tier": "Standard"  
    }
)
job.resources = ResourceConfiguration(instance_type="Standard_E4s_v3", instance_count=1)
# Submit the command job.
ml_client.create_or_update(job)

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest
queue_settings:
   job_tier: Standard # Possible values are Standard and Spot. The default is Standard.
identity:
  type: user_identity # Possible values are Managed and user_identity.
resources:
  instance_count: 1
  instance_type: Standard_E4s_v3

Here are two more examples of using serverless compute for training:

AutoML job

You don't need to specify compute for AutoML jobs. Resources can optionally be specified. If an instance count isn't specified, it's defaulted based on the max_concurrent_trials and max_nodes parameters. If you submit an AutoML image classification or NLP task without specifying an instance type, the GPU VM size is automatically selected. You can submit AutoML jobs by using CLIs, the Python SDK, or studio.

Python SDK
Azure CLI

If you want to specify the type or instance count, use the ResourceConfiguration class.

# Create the AutoML classification job with the related factory-function.
from azure.ai.ml.entities import ResourceConfiguration 

classification_job = automl.classification(
    experiment_name=exp_name,
    training_data=my_training_data_input,
    target_column_name="y",
    primary_metric="accuracy",
    n_cross_validations=5,
    enable_model_explainability=True,
    tags={"my_custom_tag": "My custom value"},
)

# Limits are all optional
classification_job.set_limits(
    timeout_minutes=600,
    trial_timeout_minutes=20,
    max_trials=max_trials,
    # max_concurrent_trials = 4,
    # max_cores_per_trial: -1,
    enable_early_termination=True,
)

# Training properties are optional
classification_job.set_training(
    blocked_training_algorithms=[ClassificationModels.LOGISTIC_REGRESSION],
    enable_onnx_compatible_models=True,
)

# Serverless compute resources used to run the job
classification_job.resources = 
ResourceConfiguration(instance_type="Standard_E4s_v3", instance_count=6)

If you want to specify the type or instance count, add a resources section.

$schema: https://azuremlsdk2.blob.core.windows.net/preview/0.0.1/autoMLJob.schema.json
type: automl
experiment_name: dpv2-cli-automl-classifier-experiment
description: A Classification job using bank marketing
# Serverless compute is used to run this AutoML job. 
# Through serverless compute, Azure Machine Learning takes care of creating, scaling, deleting, patching and managing compute, along with providing managed network isolation, reducing the burden on you.

task: classification
log_verbosity: debug
primary_metric: accuracy

target_column_name: "y"

#validation_data_size: 0.20
#n_cross_validations: 5
#test_data_size: 0.1

training_data:
  path: "./training-mltable-folder"
  type: mltable
validation_data:
  path: "./validation-mltable-folder"
  type: mltable
test_data:
  path: "./test-mltable-folder"
  type: mltable

limits:
  timeout_minutes: 180
  max_trials: 40
  max_concurrent_trials: 5
  trial_timeout_minutes: 20
  enable_early_termination: true
  exit_score: 0.92

featurization:
  mode: custom
  transformer_params:
    imputer:
      - fields: ["job"]
        parameters:
          strategy: most_frequent
  blocked_transformers:
    - WordEmbedding
training:
  enable_model_explainability: true
  allowed_training_algorithms:
    - gradient_boosting
    - logistic_regression
# Resources to run this serverless job
resources:
  instance_type="Standard_E4s_v3"
  instance_count=5

For a pipeline job, specify "serverless" as your default compute type to use serverless compute.

# Construct pipeline
@pipeline()
def pipeline_with_components_from_yaml(
    training_input,
    test_input,
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based",
):
    """E2E dummy train-score-eval pipeline with components defined via yaml."""
    # Call component obj as function: apply given inputs & parameters to create a node in pipeline
    train_with_sample_data = train_model(
        training_data=training_input,
        max_epochs=training_max_epochs,
        learning_rate=training_learning_rate,
        learning_rate_schedule=learning_rate_schedule,
    )

    score_with_sample_data = score_data(
        model_input=train_with_sample_data.outputs.model_output, test_data=test_input
    )
    score_with_sample_data.outputs.score_output.mode = "upload"

    eval_with_sample_data = eval_model(
        scoring_result=score_with_sample_data.outputs.score_output
    )

    # Return: pipeline outputs
    return {
        "trained_model": train_with_sample_data.outputs.model_output,
        "scored_data": score_with_sample_data.outputs.score_output,
        "evaluation_report": eval_with_sample_data.outputs.eval_output,
    }


pipeline_job = pipeline_with_components_from_yaml(
    training_input=Input(type="uri_folder", path=parent_dir + "/data/"),
    test_input=Input(type="uri_folder", path=parent_dir + "/data/"),
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based",
)

# set pipeline to use serverless compute
pipeline_job.settings.default_compute = "serverless"

For a pipeline job, specify azureml:serverless as your default compute type to use serverless compute.

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: 1b_e2e_registered_components
description: E2E dummy train-score-eval pipeline with registered components
# Serverless compute is used to run this pipeline job. 
# Through serverless compute, Azure Machine Learning takes care of creating, scaling, deleting, patching and managing compute, along with providing managed network isolation, reducing the burden on you.
inputs:
  pipeline_job_training_max_epocs: 20
  pipeline_job_training_learning_rate: 1.8
  pipeline_job_learning_rate_schedule: 'time-based'

outputs: 
  pipeline_job_trained_model:
    mode: upload
  pipeline_job_scored_data:
    mode: upload
  pipeline_job_evaluation_report:
    mode: upload

settings:
 default_compute: azureml:serverless

jobs:
  train_job:
    type: command
    component: azureml:my_train@latest
    inputs:
      training_data: 
        type: uri_folder 
        path: ./data      
      max_epocs: ${{parent.inputs.pipeline_job_training_max_epocs}}
      learning_rate: ${{parent.inputs.pipeline_job_training_learning_rate}}
      learning_rate_schedule: ${{parent.inputs.pipeline_job_learning_rate_schedule}}
    outputs:
      model_output: ${{parent.outputs.pipeline_job_trained_model}}
    services:
      my_vscode:
        type: vs_code
      my_jupyter_lab:
        type: jupyter_lab
      my_tensorboard:
        type: tensor_board
        log_dir: "outputs/tblogs"
    #  my_ssh:
    #    type: tensor_board
    #    ssh_public_keys: <paste the entire pub key content>
    #    nodes: all # Use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node.

  score_job:
    type: command
    component: azureml:my_score@latest
    inputs:
      model_input: ${{parent.jobs.train_job.outputs.model_output}}
      test_data: 
        type: uri_folder 
        path: ./data
    outputs:
      score_output: ${{parent.outputs.pipeline_job_scored_data}}

  evaluate_job:
    type: command
    component: azureml:my_eval@latest
    inputs:
      scoring_result: ${{parent.jobs.score_job.outputs.score_output}}
    outputs:
      eval_output: ${{parent.outputs.pipeline_job_evaluation_report}}

You can also set serverless compute as the default compute in Designer.

Configure serverless pipeline jobs with user-assigned managed identity

When you use serverless compute in pipeline jobs, we recommend that you set user identity at the individual step level that will be run on a compute, rather than at the root pipeline level. (Although the identity setting is supported at both root pipeline and step levels, the step-level setting takes precedence if both are set. However, for pipelines containing pipeline components, identity must be set on individual steps that will be run. Identity set at the root pipeline or pipeline component level won't function. Therefore, we suggest setting identity at the individual step level for the sake of simplicity.)

Python SDK
Azure CLI

def my_pipeline():
    train_job = train_component(
        training_data=Input(type="uri_folder", path="./data")
    )
    # Set managed identity for the job
    train_job.identity = {"type": "managed"}
    return {"train_output": train_job.outputs}

pipeline_job = my_pipeline()
# Configure the pipeline to use serverless compute.
pipeline_job.settings.default_compute = "serverless"

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
description: E2E dummy train-score-eval pipeline with registered components
settings:
    default_compute: azureml:serverless
jobs:
 train_job:
   type: command
   component: azureml:my_train@latest
inputs:
   training_data: 
     type: uri_folder 
      path: ./data   
 identity:
   type: managed

View more examples of training with serverless compute:

Feedback

Was this page helpful?

Last updated on 2025-12-08