Share via


Add expected response for agent evaluation test cases

Enabled for Public preview General availability
Admins, makers, marketers, or analysts, automatically Sep 21, 2025 -

Business value

This feature is designed for makers using Copilot Studio Agent Evaluation to validate their agents before and after deployment. By letting makers specify the expected answer for each test case, the evaluation framework can accurately apply the different grader types (Exact, Partial, Similarity, and Compare Meaning) and produce clear, repeatable results. This capability saves time and resources by eliminating manual comparisons in spreadsheets or external tools and gives organizations greater confidence that agents behave as intended. It improves quality and compliance at scale, speeds up release cycles, and reduces the cost of fixing issues after go-live by catching gaps early in testing.

Feature details

The Add Expected Response capability allows makers to define, edit, and manage the expected outputs for each test case. This input directly connects to the grader framework, determining how agent responses are evaluated.

Key capabilities:

  • Per-test case configuration

    • Makers enter the expected response when creating or editing a test case.

    • Both short, exact answers and longer, descriptive references are supported.

  • Integration with grader families

    • Exact or partial match - require exact text or key phrases to validate.

    • Similarity - compare semantic similarity against the reference.

    • Compare meaning (intent) - uses the reference answer to judge alignment of meaning.

    • AI Metrics - does not require a reference; provides quality signals instead.

  • Validation and usability

    • Inline error handling if a grader requiring a reference is selected without an expected response.

Geographic areas

Visit the Explore Feature Geography report for Microsoft Azure areas where this feature is planned or available.

Language availability

Visit the Explore Feature Language report for information on this feature's availability.

Additional resources

Create test cases to evaluate your agent (preview) (docs)