Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Testing is essential for ensuring your custom agents in Copilot Studio Kit respond and behave as expected. This article explains how to create, manage, and validate different types of tests, including multi-turn scenarios, perform bulk operations with Excel, and duplicate test sets.
Test types
You can create several types of tests to validate your agents.
| Test Type | Description |
|---|---|
| Response Match | This test is the simplest test type. It compares the agent response with the expected response using the selected comparison operator. By default, the exact match ("equals") is used. Other available comparison operators are "Doesn't equal," "Contains," "Doesn't contain," "Begins with," "Doesn't begin with," "Ends with," and "Doesn't end with." |
| Attachments (Adaptive cards, etc.) | Compares the agent attachments JSON response with the expected attachments JSON (full array of attachments). By default, exact match ("equals") is used. Other available comparison operators are "Doesn't equal," "Contains," "Doesn't contain." A special comparison operator called "AI Validation" uses language models to validate the attachment based on validation instructions provided by the maker, similar to generative answers. |
| Topic Match | Only available when Dataverse enrichment (Enrich with Conversation Transcripts) is configured. When the Dataverse enrichment step completes, this test compares the expected topic name and the triggered topic name. Topic Match testing also supports multi-topic match with custom agents that have generative orchestration enabled. In multi-topic matching, the topics are comma-separated; for example: "Topic1,Topic2". |
| Generative Answers | Only available if AI Builder enrichment (Analyze Generated Answers) is configured. Uses a large language model to assess if the AI-generated answer is close to a sample answer or honors validation instructions. When Enrich With Azure Application Insights is configured, negative tests, such as moderation or no search results, can also be tested. |
| Multi-turn | Consists of one or more test cases of other types, such as response match, attachments, topic match, and generative answers. All child tests in a multi-turn test run within the same conversation context in the specified order. Use multi-turn tests to test a scenario end-to-end, and to test custom agents with generative orchestration. Learn more in Multi-turn testing. |
| Plan Validation | Allows maker to validate that the dynamic plan of custom agent includes the expected tools. This test type is meant for Copilot Studio custom agents that have generative orchestration enabled. Learn more in Plan Validation testing. |
Create a new test set
Use test sets to group multiple tests together. When you run tests, select a test set to run all tests in that set.
- Access the Copilot Studio Kit application.
- Go to Test Sets.
- Create a new Agent Test Set record.
- Enter a Name.
- Select Save.
Create a new test
After you create a test set, you can add tests to it. From the Tests subgrid, select + New Agent Test.
The following table describes the fields.
| Column name | Required | Description |
|---|---|---|
| Name | Yes | Name of the test. This name can be an internal reference ID, such as TST-001. |
| Agent Test Set | Yes | Parent test set for the test. |
| Test Type | Yes | One of the available Test types. |
| Send startConversation Event | No | If enabled, the agent receives the startConversation event so it proactively starts the conversation, and the test utterance is sent after. This setting is typically required when the Conversation Start topic includes logic that must run before responding to the user or test utterance. |
| Expected Position of the Response Message | No | Don't set a value if you're unsure. This option lets you capture a specific agent response when it sends multiple messages. For example, if the agent first says "Hello" and then "How can I help you?", and you want to test the second message, set the value to 1. The order is 0-based, so the first message is indexed as 0, the second response as 1, and so on. |
| Test Utterance | Yes | The message that you want to send to the agent as part of the test. |
| Expected Response | Depends | Mandatory for the Response Match test type. Expected response from the agent. For a Generative Answers test, set a sample answer or your own validation instructions for the large language model. |
| External Variables JSON | No | JSON record for any external or contextual value you want to pass to the agent as part of the test. For example: { "Language": "fr" } |
| Seconds Before Getting Answer | No | Number of seconds to wait before evaluating the response from the bot. In most cases, you can leave this value empty, but it's useful in situations where the agent calls an API and the response might take longer than usual. |
| Expected Generative Answers Outcome | Depends | Mandatory for the Generative Answers test type. Should be either Answered or Not Answered. When Azure Application Insights enrichment is enabled, you can choose Moderated or No Search Results. |
| Expected Topic Name | Depends | Mandatory for the Topic Match test type. Name of the topic that you expect to be triggered. Multi-topic match is supported for custom agents that have generative orchestration enabled. For multi-topic match, use a comma-separated list; for example: "Topic1,Topic2". Don't add extra white space. Multi-topic matching ensures that the expected topics are among the topics in the plan. |
| Expected Attachments JSON | Depends | Mandatory for Attachments (Adaptive Cards, etc.) test type. Full attachments JSON array that you expect from the agent response. |
| Expected Tools | Depends | Mandatory for Plan Validation test type. Comma-separated list of expected tools (tools, actions, and connected agents). Don't add extra white space. Order isn't relevant. Example: "Weather,Climate change" |
| Pass Threshold % | Depends | Mandatory for Plan Validation test type. The percentage of expected tools that must be in the dynamic plan for the test to pass. If the percentage is 100, all expected tools need to be in the dynamic plan for the test to succeed. Extra tools in the dynamic plan don't affect the test result. |
Multi-turn testing
For the Multi-turn test type, you can specify one or more child tests of regular types. Each child test has an order and criticality. The order defines the execution order within the same conversation context (within the multi-turn test case). The criticality defines whether the child test case must pass for the multi-turn test execution to continue.
Any child tests that require post-testing evaluation, such as Topic Match or Generative Answers, are left in pending state and test execution continues regardless of the criticality status. If any of the critical tests fail, the execution of the multi-turn test is halted and its result is deemed failed. If all the critical child test cases succeed, the result of multi-turn is also success.
Use noncritical child test cases to "feed" information to custom agents with generative orchestration. You can also use these test cases when the response doesn't matter and you want to build up to critical tests.
Plan Validation testing
Plan Validation focuses on tool correctness. Instead of evaluating what the agent says, this test type checks whether the expected tools were used during the plan.
When defining a Plan Validation test, you specify:
- A test utterance
- A comma-separated list of expected tools to include in the dynamic plan
- A pass threshold, which represents how much deviation to tolerate from the list
This test uses conversation transcripts and is evaluated after the actual test run as an enrichment activity.
Note the following:
Expected tools: You can include tools, actions, and connected agents in the comma-separated list. No extra white space is allowed, and order doesn't matter.
Pass Threshold %: The pass threshold specifies the required portion of expected tools that need to be in the dynamic plan for the test to succeed.
Plan validation is a deterministic test: it calculates the deviation of the actual tools from the expected tools and compares it to the pass threshold. If the deviation is within the threshold, the test passes; otherwise, it fails.
Learn more: Orchestrate agent behavior with generative AI.
Use Excel to bulk create or update tests
After creating a test set, you can use Excel to bulk create or update tests.
- From your test set record, switch the subgrid view from Tests to Export/Import View.
- Select Export Agent Tests in Excel Online.
- Add and modify tests as required.
- Select Save.
If you're importing multi-turn child tests, you must first create or import the actual parent multi-turn test. Then, import the child test cases.
Learn more about Excel import and export in Power Apps model-driven apps.
Duplicate tests and test sets
You can duplicate both test sets and individual tests.
To duplicate a single test case, open the agent test record and select Duplicate Test Case. This action is useful when you create variants of a test case, such as changing the location, time, or amount.
To duplicate an entire test set, open the test set record and select Duplicate Test Set from the command bar. This action creates a copy of the test set and all its child tests.