Edit

Share via


How to run an evaluation in Azure DevOps (preview)

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Similar to the Azure AI evaluation in GitHub Actions, an Azure DevOps extension is also available in the Azure DevOps Marketplace. This extension enables offline evaluation of AI agents within your CI/CD pipelines.

Features

Prerequisites

  • Foundry project or Hubs based project. To learn more, see Create a project.
  • Install Azure AI evaluation extension.
    • Go to Azure DevOps Marketplace.
    • Search for Azure AI evaluation and install the extension into your Azure DevOps organization.

Set up YAML configuration file

  1. Create a new YAML file in your repository. You can use the sample YAML provided in the README or copy from the GitHub repo.
  2. Configure the following inputs:
    • Set up Azure CLI with service connection and Azure Login.
    • Foundry project connection string
    • Dataset and evaluators
      • Specify the evaluator names you want to use for this evaluation run.
      • Queries (required).
    • Agent IDs Retrieve agent identifiers from Foundry.

See the following sample dataset:

{ 
  "name": "MyTestData", 
  "evaluators": [ 
    "FluencyEvaluator", 
    "ViolenceEvaluator" 
  ], 
  "data": [ 

    { 
      "query": "Tell me about Tokyo?", 
    }, 
    { 
      "query": "Where is Italy?", 
    } 
  ] 
} 

A sample YAML file:


trigger: 
- main 
pool: 

  vmImage: 'windows-latest'  

steps: 

- task: AzureCLI@2 
  inputs: 
    addSpnToEnvironment: true 
    azureSubscription: ${{vars.Service_Connection_Name}}
    scriptType: bash 
    scriptLocation: inlineScript     

    inlineScript: | 
      echo "##vso[task.setvariable variable=ARM_CLIENT_ID]$servicePrincipalId"  
      echo "##vso[task.setvariable variable=ARM_ID_TOEKN]$idToken" 
      echo "##vso[task.setvariable variable=ARM_TENANT_ID]$tenantId" 

- bash: | 

   az login --service-principal -u $(ARM_CLIENT_ID) --tenant $(ARM_TENANT_ID) --allow-no-subscriptions --federated-token $(ARM_ID_TOEKN) 

  displayName: 'Login Azure' 
 
- task: UsePythonVersion@0 
  inputs: 
    versionSpec: '3.11' 
- task: AIAgentEvaluation@0 
  inputs: 
    azure-ai-project-endpoint: "<your-ai-project-endpoint>"
    deployment-name: "gpt-4o-mini" 
    data-path: $(Build.SourcesDirectory)\tests\data\golden-dataset-medium.json 
agent-ids: "<your-ai-agent-ids> 

Set up a new pipeline and trigger an evaluation run

Commit and run the pipeline in Azure DevOps.

View results

  • Select the run and go to "Azure AI Evaluation" tab.
  • The results are shown in this format:
    • The top section summarizes the overview of two AI agent variants. You can select it on the agent ID link, and it directs you to the agent setting page in Microsoft Foundry portal. You can also select the link for Evaluation Results, and it directs you to Foundry portal to view individual result in detail.
    • The second section includes evaluation scores and comparison between different variants on statistical significance (for multiple agents) and confidence intervals (for single agent).

Evaluation results and comparisons from multiple AI agents: Screenshot of multi agent evaluation result in Azure DevOps.

Single agent evaluation result: Screenshot of single agent evaluation result in Azure DevOps.