Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
| Enabled for | Public preview | General availability |
|---|---|---|
| Admins, makers, marketers, or analysts, automatically |
Sep 21, 2025 |
- |
Business value
This feature is designed for makers using Copilot Studio Agent Evaluation to validate their agents before and after deployment. By letting makers specify the expected answer for each test case, the evaluation framework can accurately apply the different grader types (Exact, Partial, Similarity, and Compare Meaning) and produce clear, repeatable results. This capability saves time and resources by eliminating manual comparisons in spreadsheets or external tools and gives organizations greater confidence that agents behave as intended. It improves quality and compliance at scale, speeds up release cycles, and reduces the cost of fixing issues after go-live by catching gaps early in testing.
Feature details
The Add Expected Response capability allows makers to define, edit, and manage the expected outputs for each test case. This input directly connects to the grader framework, determining how agent responses are evaluated.
Key capabilities:
Per-test case configuration
Makers enter the expected response when creating or editing a test case.
Both short, exact answers and longer, descriptive references are supported.
Integration with grader families
Exact or partial match - require exact text or key phrases to validate.
Similarity - compare semantic similarity against the reference.
Compare meaning (intent) - uses the reference answer to judge alignment of meaning.
AI Metrics - does not require a reference; provides quality signals instead.
Validation and usability
- Inline error handling if a grader requiring a reference is selected without an expected response.
Geographic areas
Visit the Explore Feature Geography report for Microsoft Azure areas where this feature is planned or available.
Language availability
Visit the Explore Feature Language report for information on this feature's availability.
Sep 21, 2025