Edit

Share via


Intervention points

Agentic AI expands both capability and attack surface. As soon as an agent can call external tools, write to databases, or trigger downstream processes, malfunctions or malicious attacks can lead to steering it off course, leaking sensitive data, or executing harmful actions. Relying solely on guardrails applied to models can leave these vectors exposed. To close this gap Microsoft Foundry allows guardrails to be applied directly to agents and allows the individual controls within those guardrails to be applied to four different intervention points:

Intervention Point Description Example Control at this Intervention Point
User input A query sent from a user to a model or agent. Sometimes referred to as "prompt." Some controls at this intervention point require the inclusion of document embedding by the user to take effect. Risk: User input attacks
Action: Annotate and block

When this control is specified in an agent's or model's guardrail, the user's input is scanned by a classification model that detects jailbreak attacks. If an attack is detected, the user's input is blocked from being sent to the model, halting the model.
Tool call (Preview) The next action the agent is proposing to take, as generated by its underlying model. The tool call consists of which tool is called and the arguments it's called with, including data being sent to the tool. Risk: Hate (High)
Action: Annotate and block

When this control is specified, every time the agent is about to execute a tool call, the proposed content being sent to the tool is scanned for hateful content. If any is detected, the tool call won't be executed, and the agent stops functioning until there is another user input.
Tool response (Preview) The content sent back by a tool, internal to an agent's orchestration and before the content is to the agent's memory or given back to the end user. Risk: Indirect attack
Action: Annotate and block

When this control is specified, the full payload sent back from each tool to this agent is scanned for attempted indirect prompt injection attacks. If detected, the agent stops operating immediately, and prevents the malicious content from being saved by the agent and from maliciously steering the agent off-track.
Output The final content sent back to the end user in response to their query. Risk: Protected Material for Text
Action: Annotate only

When this control is specified, the final content meant to be displayed to the user is scanned for certain types of copyrighted text. If detected, there is a flag in the annotation response for the API used to call this model or agent.

Important

Only certain types of tools are subject to controls at the tool call and tool response intervention points. Currently, Azure AI Search, Azure Functions, OpenAPI, Sharepoint Grounding, Fabric Data Agent, Bing Grounding, Bing Custom Search, and Browser Automation support moderation.