Run tests and view results

[This article is prerelease documentation and is subject to change.]

Run and analyze the results of evaluations to optimize your agent's behavior and validate that your agent meets your business and quality requirements. You can also run a test set multiple times to see changes over time as you improve your agent.

Test results are available in Copilot Studio for 89 days. To save your test results for a longer period, export the results to a CSV file.

Run a test set

After you create a test set, you can run or rerun it to compare results over time and iterations. A test can take up to a few minutes to run. You can run one test at a time.

Important

Agent evaluations that use user authentication require access through the Microsoft Copilot Studio connector. If your admin turns off this connection, you can't run tests by using the evaluation tool. For more information, see Copilot Studio connectors and data groups.

Go to your agent's Evaluation page.
Run a test by doing one of the following actions:
- At the end of creating or editing a test set, select Evaluate.
- In the Recent results section, evaluate test results by doing either of the following:
  - Hover over the test result you want to evaluate, select the three dots (…) and then select Evaluate test set again.
  - Select the test result to open it, then select the three dots (…) in the Evaluation summary pane, and then select Evaluate test set again.
If the user profile for the test set has broken connections, or the test set doesn't have a user profile, the Manage connections dialog appears. You don't have to use a user profile for testing. However, if you use a profile, all the connections must be working. For information on fixing connections, see Manage user profiles and connections.

An evaluation can take a few minutes to run. An alert appears in Copilot Studio when the test results are ready to view.

Dive into test results

Each time you run an evaluation with a test set, Copilot Studio:

Uses the connected user account to simulate conversations with the agent, sending each question in the test case to the agent.
Collects the agent's responses.
Measures and analyzes the success of each response. Each test case receives a Pass or Fail, based on the criteria of the test case.
Assigns a Pass rate score based on the Pass/Fail rate of the test set.

You can see the Pass rate of each test set run on your agent's Evaluation page, under Recent results. To see more test set runs, select See all.

Screenshot showing a list of previous evaluations.

See a detailed analysis for a test case

When you open a test result, you can see the details of the test run, a list of the queries used in the test, how the agent responded, and the Pass or Fail score.

Select a test case in the list to see a detailed assessment of each response.

Screenshot showing a list of test cases within a completed evaluation.

The assessment includes the expected and actual responses, the reasoning behind the test result, and the knowledge, topics, and tools the agent used to respond.

Select a cited knowledge or topic to open it.

Screenshot showing the detailed result and evaluation of a test case.

Compare test results

You want to test one version of your agent and see changes in performance before and after you make changes. You can compare two runs of the same test set by using the Comparison with tool.

To see a comparison, you need to run the same test set at least twice.

In your agent's Evaluation page, under Recent test results, open the test run you want to use as the base for the comparison.
Select the Compare with dropdown, and then select the time and date of the test run you want to compare with the currently open test results.

In the Test cases list, arrows show which test case results improved by changing from failing to passing , or declined by changing from passing to failing .

Select a test case to see more details. In the Evaluation summary pane, you can see a direct comparison of test scores, with the current test run's result on top.

Screenshot showing the compared results of two test sets.

Export test results

You can export test results to a CSV file. The file lists the question, expected response (if applicable), test method, passing score (if applicable), the agent's response, the test result, and analysis for each test case.

Go to your agent's Evaluation page.
In the Recent results section, you can export a test result by doing either of the following steps:
- Hover over the test case you want to export, select the three dots (…) and then select Export test results.
- Select the test case to open it, then select the three dots (…) in the Evaluation summary pane, and then select Export test results.

The test results download as your test set name.csv.

Feedback

Was this page helpful?

Last updated on 2026-01-15