Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The Copilot Studio Kit provides a comprehensive interface for analyzing test results.
Test run details
The Agent Test Run interface shows the status of test runs.
| Status | Description |
|---|---|
| Run Status | Main process that runs each individual agent test against the agent configuration by using the Direct Line API, and creates a corresponding Agent Test Result record. |
| App Insights Enrichment Status | Runs only if Enrich With Azure Application Insights is enabled on the related Agent Configuration record. |
| Generated Answers Analysis | Runs only if Analyze Generated Answers is enabled on the related Agent Configuration record. |
| Dataverse Enrichment Status | Runs only if Enrich With Conversation Transcripts is enabled on the related Agent Configuration record. |
Learn more about Agent Configuration settings in Configure agents in Copilot Studio Kit.
The following image shows the Test Runs interface, where you can view details of the test run.
Aggregated results
After a cloud flow runs, the system calculates the aggregated results.
| Aggregated result | Description |
|---|---|
| # Tests | Number of test results. |
| Success Rate (%) | Percentage of test result records with a Success result compared to the total number of test results. |
| Average Latency (ms) | Average time, in milliseconds, for the agent to send the message after it receives the test utterance. |
| # Success | Number of test result records with a Success result. |
| # Failed | Number of test result records with a Failed result. |
| # Pending | Number of test result records with a Pending result. |
| # Unknown | Number of test result records with an Unknown result. |
| # Error | Number of test result records with an Error result. |
Detailed results
Analyze results after you complete each step, as some results are only available after the steps finish. For example, Topic Match tests need Dataverse enrichment to fully run, as only this step provides information on the topic name that was triggered.
You can edit the results view to adjust results individually.
Each result has a Result Reason section that's automatically populated with an explanation for the result. For AI-generated assessments, it recommends a human review: "AI-generated assessment of the response. Please review." Testers can use this attribute to add their own comments and notes on a test.
For each of the following test types, you can use the Results filter to view only the results of a specific type:
- Generative Answers Results
- Response Match Results
- Topic Match Results
- Attachment Results
Agent Test Result details
The Agent Test Result form provides details on each individual test execution. The system automatically creates these records.
| Column Name | Description |
|---|---|
| Conversation ID | ID of the conversation that the Direct Line API provides. |
| Agent Test Run | Test run that the record relates to. |
| Agent Test | Test that the record relates to. You can see the test details in a Quick View form. |
| Result | Result: Success, Failed, Unknown, Error, Pending. |
| Explanation | Autogenerated explanation of the result. |
| Latency (ms) | Time, in milliseconds, that the agent takes to send the message back after receiving the test utterance. |
| Message Sent | Timestamp of the message that the user sends. |
| Response Received | Timestamp of the message that the agent sends. |
| Response | Text message the agent sends. |
| App Insights Result | Generative answer results from Azure Application Insights (when Enrich With Azure Application Insights is enabled). |
| Triggered Topic ID | Unique identifier of the Chatbot Subcomponent record for the triggered topic in Dataverse (when Enrich with Conversation Transcripts is enabled). |
| Triggered Topic / Event | Name of the triggered topic (when Enrich With Conversation Transcripts is enabled). If multiple topics matched, IntentCandidates. For Conversational Boosting and Fallback, UnknownIntent. |
| Recognized Intent Score | If intent recognition occurs, the score of the top intent. |
| Conversation Transcript | File attachment of the full conversation transcript JSON (when Enrich with Conversation Transcripts is enabled and Copy Full Transcript is set to yes). |
| Suggested Actions | When available, JSON of the suggested actions that the agent returns and associates with its response. |
| Attachments | When available, JSON of the attachments array that the agent returns and associates with its response. |
| Citations | For generated answers, JSON array of the citations that the agent uses to generate the answer (when Enrich with Conversation Transcripts is enabled). |
Inspect the transcript
If you enable Enrich With Conversation Transcripts and set Copy Full Transcript to yes, the test result includes the full transcript. When you analyze a test result, go to the Transcript tab for a detailed transcript view in JSON format with an accompanying visualization.
Analyze multi-turn test results
The results view shows multi-turn tests along with other test types. You see their overall result (Success or Failed) in the Result column. Select the Conversation ID value to view details for the multi-turn test and a list of child tests that make up the test.
In the detailed view of Multiturn Test Results, you can see results of individual child tests and drill down into their details. The result of a multi-turn test depends on results of its child tests that are marked as critical. Noncritical child tests can fail, and the multi-turn test case continues to the next test case. If any of the critical child tests fail, test execution for that multi-turn stops and the test is marked as Failed. If all the critical child tests pass, the result of the multi-turn test is Success.
Multi-turn test cases can include noncritical tests because they provide information to the generative orchestrator. The exact response to the test case doesn't matter, just the critical tests that follow.
The multi-turn test (and the Multiturn Test Result) can include any of the regular test types: Response match, Attachments, Topic Match, and Generative Answers.
Where to get help
If you experience issues, review the troubleshooting guidance or raise a support request on GitHub.