Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Learn how to fine-tune and deploy models using managed compute in Microsoft Foundry. Adjust training parameters (learning rate, batch size, epochs) to optimize performance.
Fine-tuning a pretrained model for a related task is more efficient than training a new model from scratch.
Use the fine-tune settings in the portal to configure data, compute, and hyperparameters. After training completes you can evaluate and deploy the resulting model.
In this article, you learn how to:
- Select a foundation model.
- Configure compute and data splits.
- Tune hyperparameters safely.
- Submit and monitor a fine-tune job.
- Evaluate and deploy the fine-tuned model.
Prerequisites
Note
This document refers to the Microsoft Foundry (classic) portal only.
You must use a hub-based project for this feature. A Foundry project isn't supported. See How do I know which type of project I have? and Create a hub-based project.
- An Azure account with an active subscription. If you don't have one, create a free Azure account, which includes a free trial subscription.
- If you don't have one, create a hub-based project.
- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Foundry portal. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure subscription. For more information on permissions, see Role-based access control in Foundry portal.
Fine-tune a foundation model using managed compute
Tip
Because you can customize the left pane in the Microsoft Foundry portal, you might see different items than shown in these steps. If you don't see what you're looking for, select ... More at the bottom of the left pane.
-
Sign in to Microsoft Foundry. Make sure the New Foundry toggle is off. These steps refer to Foundry (classic).
If you're not already in your project, select it.
Select Fine-tuning from the left pane.
- Select Fine-tune a model and add the model that you want to fine-tune. This article uses Phi-3-mini-4k-instruct for illustration.
- Select Next to see the available fine-tune options. Some foundation models support only the Managed compute option.
Alternatively, you could select Model catalog from the left sidebar of your project and find the model card of the foundation model that you want to fine-tune.
- Select Fine-tune on the model card to see the available fine-tune options. Some foundation models support only the Managed compute option.
Select Managed compute. This opens Basic settings.
Configure fine-tune settings
In this section, you go through the steps to configure fine-tuning for your model, using a managed compute.
Provide a model name (for example,
phi3mini-faq-v1). Select Next for Compute.Select a GPU VM size. Ensure quota for the chosen SKU.
Select Next for Training data. Task type may be preset (for example, Chat completion).
Provide training data (upload JSONL/CSV/TSV or select a registered dataset). Balance examples to reduce bias.
Select Next for Validation data. Keep Automatic split or supply a separate dataset.
Select Next for Task parameters. Adjust epochs, learning rate, batch size. Start conservative; iterate based on validation metrics.
Select Next for Review. Confirm counts and parameters.
Select Submit to start the job.
Monitor and evaluate
- Track job status in the fine-tuning jobs list.
- Review logs for preprocessing or allocation issues.
- After completion, view generated evaluation metrics (if enabled) or run a separate evaluation comparing base vs fine-tuned model.
Deploy the fine-tuned model
Deploy from the job summary. Use a deployment name like faq-v1. Record model version and dataset hash for reproducibility. Add tracing to monitor real requests.
Troubleshooting
| Issue | Cause | Action |
|---|---|---|
| Stuck in Queued | Insufficient GPU capacity | Try alternate SKU or region |
| Overfitting quickly | Too many epochs / small dataset | Reduce epochs or expand data |
| No metric improvement | Dataset noise / misaligned objective | Refine labeling or metric selection |
| Higher latency post deploy | Larger base model / adapter overhead | Consider smaller base model or tune batch size |