Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
These frequently asked questions (FAQ) describe the AI effect of analytics assistance features in Copilot Studio.
How is generative AI used for analytics?
Copilot Studio uses AI to measure the quality of generative answer responses and to create clusters, which are used to provide insights into agent performance.
Generative answers uses knowledge sources of your choosing to generate a response. The feature also collects any feedback you provide. Analytics use large language models (LLMs) to classify the chat messages between users and agents into levels indicating the quality of generative answer responses. Copilot Studio compiles these indicators to give makers a summary of an agent’s overall performance.
Clustering uses LLMs to sort users' messages into groups based on shared subjects and provide each group with a descriptive name. Copilot Studio uses the names of these clusters to provide different types of insights you can use to improve your agent.
Quality of responses for generative answers
What is the quality of response intended use?
Makers use quality of response analytics to discover insights into agent usage and performance, then create actions for agent improvement. Currently, analytics can be used to understand if the quality of an agent’s generative answers meets the maker's expectations.
In addition to overall quality, quality of response analytics identifies areas where an agent performs poorly or fails to perform the maker’s intended goals. Based on that, the maker can define areas where generative answers perform poorly and take steps to improve their quality.
In addition, when identifying poor performance, there are best practices that can help improve quality. For example, after identifying knowledge sources with poor performance, a maker can edit the knowledge source or split the knowledge source into multiple, more focused sources for increased quality.
What data is used to create analytics for quality of response?
Quality of response analytics are calculated using a sample of generative answer responses. It requires the user query, the agent response, and the relevant knowledge sources that the generative model uses for the generative answer.
Quality of response analytics uses that information to evaluate if the generative answer quality is good, and if not, then why the quality is poor. For example, quality of response can identify incomplete, irrelevant, or not fully grounded responses.
What are the limitations of quality of response analytics, and how can users minimize the impact of limitations?
Quality of response analytics aren't calculated using all generative responses. Instead, analytics measures a sample of user-agent sessions. Agents below a minimum number of successful generative answers can't receive a quality of response analytical summary.
There are cases when analytics don't evaluate an individual response accurately. On an aggregated level, it should be accurate for most cases.
Quality of response analytics don’t provide a breakdown of the specific queries that led to low quality performance. They also don't provide a breakdown of common knowledge sources or topics that were used when low quality responses occur.
Analytics aren't calculated for answers that use generative knowledge.
Part of the metrics quality of responses analytics assesses is answer completeness. It evaluates how much the response is complete in related to the retrieved document.
If a relevant document which contains additional information to the given question isn’t retrieved, the completeness metric isn't evaluated according to this document.
What protections are in place within Copilot Studio for responsible AI?
Users of agents don't see analytics results; they're available to agent makers and admins only.
Makers and admins can only use quality of response analytics to see the percentage of good quality responses and any predefined reasons for poor performance. Makers can only see the percentage of good quality responses and predefine reasons.
We tested analytics for quality of responses thoroughly during development to ensure good performance. However, on rare occurrences, quality of response assessments may be inaccurate.
Themes of user questions
What is the intended use of Themes?
This feature automatically analyzes large sets of user queries and groups them into high-level topics called themes. Each theme represents a single high-level subject users asked about. Themes provide an unsupervised, data-driven view of user content. This view helps teams understand what users care about most without the manual step of reviewing thousands of queries.
What data is used to create clusters?
The Themes feature uses user queries that trigger generative answers. Themes analyzes all queries from the past seven days to generate new suggested themes.
Themes uses semantic similarity to group queries. A language model is then used to generate the title and description for each cluster. Feedback from makers (such as thumbs up/down) is also collected to improve clustering quality.
What are the limitations of clustering for Themes, and how can users mitigate these limitations?
Successful clustering into themes depends on query volume. If there are not enough queries or if the queries are too unrelated to one another, Copilot Studio might cluster queries into themes that are overly broad or overly narrow.
Themes can occasionally split similar topics or merge unrelated ones.
Shifting language in queries might affect consistency of clusters over time.
Makers can review themes regularly and provide feedback to improve naming quality.
What protections for Themes are in place within Copilot Studio in terms of responsible AI?
Themes are only visible to makers and admins. Content moderation is applied when generating names and descriptions to reduce the risk of harmful or inappropriate outputs.