Best practices for improving conversational agent performance

Avoid performance issues in conversational agents by understanding common failure points and following best practices.

Quotas and limits

Understand quotas and limits, like RPM (requests per minute, where a request is a message sent to an agent) and the number of Power Platform requests allowed within a 24-hour timeframe.

Quotas apply to your agents along with the capacity constraints that come with a Microsoft Copilot Studio plan.

Optimize your agent for performance

To optimize your agent for performance, consider the following best practices:

Place API calls and connector invocations strategically in your conversation flows to avoid making users wait for multiple completions.
Where applicable, cache retrieved information using variables, instead of making multiple API calls or flow invocations.
Cloud flows invoked from Copilot Studio agents can introduce latency. Consider using direct connector calls or the Send HTTP Request node instead.
Understand the performance and complexity tradeoff of Classic NLU and generative orchestration in Copilot Studio. Natural Language Understanding (NLU) models work well for specific intents but struggle with complex queries. Generative AI models handle a wider range of inputs but can introduce latency.
Turn on express mode.

Optimize your cloud flow for performance

If your agent calls a Power Automate flow, ensure that cloud flows are optimized.

Make sure you understand the throttling and capacity limits within Power Automate and Power Platform. Adhering to these limits improves flow scalability and performance.
Learn about troubleshooting slow running flows.

Feedback

Was this page helpful?

Last updated on 2026-01-20