Share via


Run tests and view results

[This article is prerelease documentation and is subject to change.]

By using the results from the test set, you can optimize your agent's behavior and validate that your agent meets your business and quality requirements. You can also run test sets multiple times to compare results as you improve your agent.

Test results are available in Copilot Studio for 89 days. To save your test results for a longer period, export the results to a CSV file.

Important

This article contains Microsoft Copilot Studio preview documentation and is subject to change.

Preview features aren't meant for production use and may have restricted functionality. These features are available before an official release so that you can get early access and provide feedback.

If you're building a production-ready agent, see Microsoft Copilot Studio Overview.

Run a test set

After you create a test set, you can run or rerun it to compare results over time and iterations.

Important

Agent evaluations that use user authentication require access through the Microsoft Copilot Studio connector. If your admin turns off this connection, you can't run tests by using the evaluation tool. For more information, see Copilot Studio connectors and data groups.

  1. Go to your agent's Evaluations page.

  2. Run a test by doing one of the following actions:

  • At the end of creating or editing a test set, select Evaluate.

  • Find the test set in the Test sets list, then select the More icon () > Evaluate test set.

  • Hover over a test result that uses the set you want to use, then select the More icon () > Evaluate test set again.

If the user profile for the test set has broken connections, or the test set doesn't have a user profile, the Manage connections dialog appears. You don't have to use a user profile for testing. However, if you do use a profile, all the connections must be working. For information on fixing connections, see Manage user profiles and connections.

Screenshot showing the more menu icons that appear when you hover over test sets or evaluation results.

An evaluation can take a few minutes to run. An alert appears in Copilot Studio when the test results are ready to view.

Dive into test results

Each time you run an evaluation with a test set, Copilot Studio:

  1. Uses the connected user account to simulate conversations with the agent, sending each question in the test case to the agent.

  2. Collects the agent's responses.

  3. Measures and analyzes the success of each response. Each test case receives a Pass or Fail, based on the criteria of the test case.

  4. Assigns a Pass rate score based on the Pass/Fail rate of the test set.

You can see the Pass rate of each test set run on your agent's Evaluation page, under Recent results. To see more test set runs, select See all.

Screenshot showing a list of previous evaluations.

See a detailed analysis for a test case

When you open a test result, you can see the details of the test run, a list of the queries used in the test, how the agent responded, and the Pass or Fail score.

Select a test case in the list to see a detailed assessment of each response.

Screenshot showing a list of test cases within a completed evaluation.

The assessment includes the expected and actual responses, the reasoning behind the test result, and the knowledge, topics, and tools the agent used to respond.

Select a cited knowledge or topic to open it.

Screenshot showing the detailed result and evaluation of a test case.

Compare test results

You want to test one version of your agent and see changes in performance before and after you make changes. You can compare two runs of the same test set by using the Comparison with tool.

To see a comparison, you need to run the same test set at least twice.

  1. In your agent's Evaluation page, open the test run you want to use as a base for the comparison, under Recent test results.

  2. Select the Compare with dropdown, then select the time and date of the test run you want to compare with the currently open test results.

Screenshot showing the Compare with dropdown.

In the Test cases list, arrows show which test case results improved by changing from failing to passing , or declined by changing from passing to failing .

Select a test case to see more details. In the Evaluation summary pane, you can see a direct comparison of test scores, with the current test run's result on top.

Screenshot showing the compared results of two test sets.

Export test results

You can export test results to a CSV file. The file lists the question, expected response (if applicable), test method, passing score (if applicable), the agent's response, the test result, and analysis for each test case.

  1. On your agent's Evaluations page.

  2. Select the results you want to export.

  3. In the Evaluation summary pane, select the more icon () > Export test results.

The test results download as your test set name.csv.