Analyze test results using Copilot Studio Kit

The Copilot Studio Kit provides a comprehensive interface for analyzing test results.

Test run details

The Agent Test Run interface shows the status of test runs.

Status	Description
Run Status	Main process that runs each individual agent test against the agent configuration by using the Direct Line API, and creates a corresponding Agent Test Result record.
App Insights Enrichment Status	Runs only if Enrich With Azure Application Insights is enabled on the related Agent Configuration record.
Generated Answers Analysis	Runs only if Analyze Generated Answers is enabled on the related Agent Configuration record.
Dataverse Enrichment Status	Runs only if Enrich With Conversation Transcripts is enabled on the related Agent Configuration record.

Learn more about Agent Configuration settings in Configure agents in Copilot Studio Kit.

The following image shows the Test Runs interface, where you can view details of the test run.

Aggregated results

After a cloud flow runs, the system calculates the aggregated results.

Aggregated result	Description
# Tests	Number of test results.
Success Rate (%)	Percentage of test result records with a Success result compared to the total number of test results.
Average Latency (ms)	Average time, in milliseconds, for the agent to send the message after it receives the test utterance.
# Success	Number of test result records with a Success result.
# Failed	Number of test result records with a Failed result.
# Pending	Number of test result records with a Pending result.
# Unknown	Number of test result records with an Unknown result.
# Error	Number of test result records with an Error result.

Detailed results

Analyze results after you complete each step, as some results are only available after the steps finish. For example, Topic Match tests need Dataverse enrichment to fully run, as only this step provides information on the topic name that was triggered.

You can edit the results view to adjust results individually.

Each result has a Result Reason section that's automatically populated with an explanation for the result. For AI-generated assessments, it recommends a human review: "AI-generated assessment of the response. Please review." Testers can use this attribute to add their own comments and notes on a test.

For each of the following test types, you can use the Results filter to view only the results of a specific type:

Generative Answers Results
Response Match Results
Topic Match Results
Attachment Results

Screenshot of the System View options available for Results.

Agent Test Result details

The Agent Test Result form provides details on each individual test execution. The system automatically creates these records.

Column Name	Description
Conversation ID	ID of the conversation that the Direct Line API provides.
Agent Test Run	Test run that the record relates to.
Agent Test	Test that the record relates to. You can see the test details in a Quick View form.
Result	Result: `Success`, `Failed`, `Unknown`, `Error`, `Pending`.
Explanation	Autogenerated explanation of the result.
Latency (ms)	Time, in milliseconds, that the agent takes to send the message back after receiving the test utterance.
Message Sent	Timestamp of the message that the user sends.
Response Received	Timestamp of the message that the agent sends.
Response	Text message the agent sends.
App Insights Result	Generative answer results from Azure Application Insights (when Enrich With Azure Application Insights is enabled).
Triggered Topic ID	Unique identifier of the Chatbot Subcomponent record for the triggered topic in Dataverse (when Enrich with Conversation Transcripts is enabled).
Triggered Topic / Event	Name of the triggered topic (when Enrich With Conversation Transcripts is enabled). If multiple topics matched, `IntentCandidates`. For Conversational Boosting and Fallback, `UnknownIntent`.
Recognized Intent Score	If intent recognition occurs, the score of the top intent.
Conversation Transcript	File attachment of the full conversation transcript JSON (when Enrich with Conversation Transcripts is enabled and Copy Full Transcript is set to yes).
Suggested Actions	When available, JSON of the suggested actions that the agent returns and associates with its response.
Attachments	When available, JSON of the attachments array that the agent returns and associates with its response.
Citations	For generated answers, JSON array of the citations that the agent uses to generate the answer (when Enrich with Conversation Transcripts is enabled).

Inspect the transcript

If you enable Enrich With Conversation Transcripts and set Copy Full Transcript to yes, the test result includes the full transcript. When you analyze a test result, go to the Transcript tab for a detailed transcript view in JSON format with an accompanying visualization.

Analyze multi-turn test results

The results view shows multi-turn tests along with other test types. You see their overall result (Success or Failed) in the Result column. Select the Conversation ID value to view details for the multi-turn test and a list of child tests that make up the test.

In the detailed view of Multiturn Test Results, you can see results of individual child tests and drill down into their details. The result of a multi-turn test depends on results of its child tests that are marked as critical. Noncritical child tests can fail, and the multi-turn test case continues to the next test case. If any of the critical child tests fail, test execution for that multi-turn stops and the test is marked as Failed. If all the critical child tests pass, the result of the multi-turn test is Success.

Multi-turn test cases can include noncritical tests because they provide information to the generative orchestrator. The exact response to the test case doesn't matter, just the critical tests that follow.

The multi-turn test (and the Multiturn Test Result) can include any of the regular test types: Response match, Attachments, Topic Match, and Generative Answers.

Where to get help

If you experience issues, review the troubleshooting guidance or raise a support request on GitHub.

Feedback

Was this page helpful?

Last updated on 2025-10-27