Share via


FAQ for analytics

These frequently asked questions (FAQ) describe the AI effect of analytics assistance features in Copilot Studio.

How is generative AI used for analytics?

Copilot Studio uses AI to measure the quality of generative answer responses and to create clusters, which are used to provide insights into agent performance.

Generative answers uses knowledge sources of your choosing to generate a response. The feature also collects any feedback you provide. Analytics use large language models (LLMs) to classify the chat messages between users and agents into levels indicating the quality of generative answer responses. Copilot Studio compiles these indicators to give makers a summary of an agent's overall performance.

Clustering uses LLMs to sort users' messages into groups based on shared subjects and provide each group with a descriptive name. Copilot Studio uses the names of these clusters to provide different types of insights you can use to improve your agent.

Quality of responses for generative answers

What is the quality of response intended use?

Makers use quality of response analytics to discover insights into agent usage and performance, then create actions for agent improvement. Currently, analytics can be used to understand if the quality of an agent's generative answers meets the maker's expectations.

In addition to overall quality, quality of response analytics identifies areas where an agent performs poorly or fails to perform the maker's intended goals. Makers can define areas where generative answers perform poorly and take steps to improve their quality.

In addition, when identifying poor performance, there are best practices that can help improve quality. For example, after identifying knowledge sources with poor performance, a maker can edit the knowledge source or split the knowledge source into multiple, more focused sources for increased quality.

What data is used to create analytics for quality of response?

Quality of response analytics are calculated using a sample of generative answer responses. It requires the user query, the agent response, and the relevant knowledge sources that the generative model uses for the generative answer.

Quality of response analytics uses that information to evaluate if the generative answer quality is good, and if not, then why the quality is poor. For example, quality of response can identify incomplete, irrelevant, or not fully grounded responses.

What are the limitations of quality of response analytics, and how can users minimize the impact of limitations?

  • Quality of response analytics aren't calculated using all generative responses. Instead, analytics measures a sample of user-agent sessions. Agents below a minimum number of successful generative answers can't receive a quality of response analytical summary.

  • There are cases when analytics don't evaluate an individual response accurately. On an aggregated level, it should be accurate for most cases.

  • Quality of response analytics don't provide a breakdown of the specific queries that led to low quality performance. They also don't provide a breakdown of common knowledge sources or topics that were used when low quality responses occur.

  • Analytics aren't calculated for answers that use generative knowledge.

  • Answer completeness is one of the metrics used to assess response quality. This metric measures how fully the response addresses the content in the retrieved document.

    If the system doesn't retrieve a relevant document with additional information for the question, it doesn't evaluate the completeness metric for that document.

What protections are in place for quality of response analytics within Copilot Studio for responsible AI?

Users of agents don't see analytics results; they're available to agent makers and admins only.

Makers and admins can only use quality of response analytics to see the percentage of good quality responses and any predefined reasons for poor performance. Makers can only see the percentage of good quality responses and predefine reasons.

We tested analytics for quality of responses thoroughly during development to ensure good performance. However, on rare occurrences, quality of response assessments might be inaccurate.

Sentiment analysis for conversational sessions

What is the intended use of sentiment analysis?

Makers use sentiment analysis to understand the level of user satisfaction in conversation sessions based on an AI analysis of user messages to the agent. Makers can understand the overall sentiment of the session (positive, negative, or neutral), investigate the reasons, and take measures to address it.

What data is used to define sentiment in a conversational session?

Copilot Studio calculates sentiment analysis for based on user messages to the agent for a sample set of conversational sessions.

Sentiment analytics uses that information to evaluate if the user satisfaction during the session is positive, negative, or neutral. For example, a user can use words and a tone of voice that indicate frustration or dissatisfaction based on the interaction with the agent. In this case, the session is classified as negative sentiment.

What are the limitations of sentiment analysis, and how can users mitigate for these limitations?

Sentiment analytics aren't calculated using all conversational sessions. Instead, analytics measures a sample of user-agent sessions. Agents below a minimum number of daily successful generative answers can't receive a sentiment score.

Sentiment analysis currently has a dependency on generative answers and requires a minimum number of daily successful answers to calculate sentiment score for the agent.

To calculate sentiment for a session, there must be at least two user messages. Additionally, due to current technical constraints, sentiment analysis isn't performed on sessions that exceed a total of 26 messages (including both user and agent messages)

Sentiment analysis doesn't provide a breakdown of the specific user messages that led to the sentiment score.

What protections are in place for sentiment analysis within Copilot Studio for responsible AI?

Users of agents don't see analytics results; they're available to agent makers and admins only.

Makers and admins can only use sentiment analysis to see the breakdown of sentiment across all sessions.

We tested sentiment analysis thoroughly during development to ensure good performance. However, on rare occurrences, sentiment assessments might be inaccurate.

Themes of user questions

What is the intended use of Themes?

This feature automatically analyzes large sets of user queries and groups them into high-level topics called themes. Each theme represents a single high-level subject users asked about. Themes provide an unsupervised, data-driven view of user content. This view helps teams understand what users care about most without the manual step of reviewing thousands of queries.

What data is used to create clusters?

The Themes feature uses user queries that trigger generative answers. Themes analyzes all queries from the past seven days to generate new suggested themes.

Themes uses semantic similarity to group queries. A language model is then used to generate the title and description for each cluster. Feedback from makers (such as thumbs up/down) is also collected to improve clustering quality.

What are the limitations of clustering for Themes, and how can users mitigate these limitations?

Successful clustering into themes depends on query volume. If there are not enough queries or if the queries are too unrelated to one another, Copilot Studio might cluster queries into themes that are overly broad or overly narrow.

Themes can occasionally split similar topics or merge unrelated ones.

Shifting language in queries might affect consistency of clusters over time.

Makers can review themes regularly and provide feedback to improve naming quality.

What protections for Themes are in place within Copilot Studio in terms of responsible AI?

Themes are only visible to makers and admins. Content moderation is applied when generating names and descriptions to reduce the risk of harmful or inappropriate outputs.