Edit

Share via


Fine-tune models using managed compute (preview)

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Learn how to fine-tune and deploy models using managed compute in Microsoft Foundry. Adjust training parameters (learning rate, batch size, epochs) to optimize performance.

Fine-tuning a pretrained model for a related task is more efficient than training a new model from scratch.

Use the fine-tune settings in the portal to configure data, compute, and hyperparameters. After training completes you can evaluate and deploy the resulting model.

In this article, you learn how to:

  • Select a foundation model.
  • Configure compute and data splits.
  • Tune hyperparameters safely.
  • Submit and monitor a fine-tune job.
  • Evaluate and deploy the fine-tuned model.

Prerequisites

Note

This document refers to the Microsoft Foundry (classic) portal only.

You must use a hub-based project for this feature. A Foundry project isn't supported. See How do I know which type of project I have? and Create a hub-based project.

  • Azure role-based access controls (Azure RBAC) are used to grant access to operations in Foundry portal. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure subscription. For more information on permissions, see Role-based access control in Foundry portal.

Fine-tune a foundation model using managed compute

Tip

Because you can customize the left pane in the Microsoft Foundry portal, you might see different items than shown in these steps. If you don't see what you're looking for, select ... More at the bottom of the left pane.

  1. Sign in to Microsoft Foundry. Make sure the New Foundry toggle is off. These steps refer to Foundry (classic).

  2. If you're not already in your project, select it.

  3. Select Fine-tuning from the left pane.

    1. Select Fine-tune a model and add the model that you want to fine-tune. This article uses Phi-3-mini-4k-instruct for illustration.
    2. Select Next to see the available fine-tune options. Some foundation models support only the Managed compute option.
  4. Alternatively, you could select Model catalog from the left sidebar of your project and find the model card of the foundation model that you want to fine-tune.

    1. Select Fine-tune on the model card to see the available fine-tune options. Some foundation models support only the Managed compute option.

    Screenshot showing fine-tuning options for a foundation model in Foundry.

  5. Select Managed compute. This opens Basic settings.

Configure fine-tune settings

In this section, you go through the steps to configure fine-tuning for your model, using a managed compute.

  1. Provide a model name (for example, phi3mini-faq-v1). Select Next for Compute.

  2. Select a GPU VM size. Ensure quota for the chosen SKU.

    Screenshot showing settings for the compute to use for fine-tuning.

  3. Select Next for Training data. Task type may be preset (for example, Chat completion).

  4. Provide training data (upload JSONL/CSV/TSV or select a registered dataset). Balance examples to reduce bias.

  5. Select Next for Validation data. Keep Automatic split or supply a separate dataset.

  6. Select Next for Task parameters. Adjust epochs, learning rate, batch size. Start conservative; iterate based on validation metrics.

  7. Select Next for Review. Confirm counts and parameters.

  8. Select Submit to start the job.

Monitor and evaluate

  • Track job status in the fine-tuning jobs list.
  • Review logs for preprocessing or allocation issues.
  • After completion, view generated evaluation metrics (if enabled) or run a separate evaluation comparing base vs fine-tuned model.

Deploy the fine-tuned model

Deploy from the job summary. Use a deployment name like faq-v1. Record model version and dataset hash for reproducibility. Add tracing to monitor real requests.

Troubleshooting

Issue Cause Action
Stuck in Queued Insufficient GPU capacity Try alternate SKU or region
Overfitting quickly Too many epochs / small dataset Reduce epochs or expand data
No metric improvement Dataset noise / misaligned objective Refine labeling or metric selection
Higher latency post deploy Larger base model / adapter overhead Consider smaller base model or tune batch size