Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
| Enabled for | Public preview | General availability |
|---|---|---|
| Admins, makers, marketers, or analysts, automatically |
Sep 21, 2025 |
- |
Business value
The evaluation framework enhances agent validation by enabling automated testing workflows, minimizing manual effort, and providing clear execution results. It ensures consistent and reliable agent responses, allowing Makers to identify potential issues early in the development cycle. By offering run results and evaluation indicators, Makers can better assess test coverage, verify execution integrity, and improve overall agent performance, leading to faster deployment and increased reliability.
Feature details
The evaluation framework in Copilot Studio introduces a structured and automated approach to testing AI agents, ensuring high-quality deployments and continuous improvement. It is built around three core workstreams:
Initiating automated evaluation processes Makers can initiate automated evaluation tests seamlessly, either directly from the agent or through the test pane. This enables structured validation workflows, ensuring consistent and repeatable testing.
Advanced test query editing The evaluation framework allows Makers to refine and customize test queries to maximize validation accuracy: • Dynamically modify test queries to adapt to different testing needs • Manually enter custom test questions for expanded scenario coverage • Leverage AI-generated test queries to enhance evaluation depth
Automated test execution and results display The evaluation framework provides a structured and automated testing workflow, ensuring reliable execution and clear validation results: • Execute automated tests to assess agent responses across multiple scenarios • Provide an overall performance summary, helping users quickly gauge evaluation results • Break down results by session to track execution details and agent behavior • Provide detailed question-level feedback, including: o Evaluation of answers and correctness o Explanations for failed tests o Identification of the question source for better traceability
Geographic areas
Visit the Explore Feature Geography report for Microsoft Azure areas where this feature is planned or available.
Language availability
Visit the Explore Feature Language report for information on this feature's availability.
Sep 21, 2025