Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article explains how to deploy and use the latest Claude models in Foundry, including Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5, and Claude Opus 4.1. Anthropic's flagship product is Claude, a frontier AI model useful for complex tasks such as coding, agents, financial analysis, research, and office tasks. Claude delivers exceptional performance while maintaining high safety standards.
Available Claude models
Foundry supports Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5, and Claude Opus 4.1 models through global standard deployment. These models have key capabilities that include:
- Extended thinking: Extended thinking gives Claude enhanced reasoning capabilities for complex tasks.
- Image and text input: Strong vision capabilities that enable the models to process images and return text outputs for analyzing and understanding charts, graphs, technical diagrams, reports, and other visual assets.
- Code generation: Advanced thinking that includes code generation, analysis, and debugging for Claude Sonnet 4.5 and Claude Opus 4.1.
For more details about the model capabilities, see capabilities of Claude models.
Claude Opus 4.5 (preview)
Claude Opus 4.5 is Anthropic's most intelligent model, and an industry leader across coding, agents, computer use, and enterprise workflows. With a 200K token context window and 64K max output, Opus 4.5 is ideal for production code, sophisticated agents, office tasks, financial analysis, cybersecurity, and computer use.
Claude Sonnet 4.5 (preview)
Claude Sonnet 4.5 is a highly capable model designed for building real-world agents and handling complex, long-horizon tasks. It offers a strong balance of speed and cost for high-volume use cases. Sonnet 4.5 also provides advanced accuracy for computer use, enabling developers to direct Claude to use computers the way people do.
Claude Haiku 4.5 (preview)
Claude Haiku 4.5 delivers near-frontier performance for a wide range of use cases. It stands out as one of the best coding and agent models, with the right speed and cost to power free products and scaled sub-agents.
Claude Opus 4.1 (preview)
Claude Opus 4.1 is an industry leader for coding. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.
Prerequisites
- An Azure subscription with a valid payment method. If you don't have an Azure subscription, create a paid Azure account to begin.
- Access to Microsoft Foundry with appropriate permissions to create and manage resources.
- A Microsoft Foundry project created in one of the supported regions: East US2 and Sweden Central.
- Foundry Models from partners and community require access to Azure Marketplace to create subscriptions. Ensure you have the permissions required to subscribe to model offerings.
Deploy Claude models
Claude models in Foundry are available for global standard deployment. To deploy a Claude model, follow the instructions in Add and configure models to Microsoft Foundry Models.
After deployment, you can use the Foundry playground to interactively test the model.
Work with Claude models
Once deployed, you have some options for interacting with Claude models to generate text responses:
Use the Anthropic SDKs and the following Claude APIs:
- Messages API to send a structured list of input messages with text and/or image content, and the model generates the next message in the conversation.
- Token Count API to count the number of tokens in a message.
- Files API to upload and manage files to use with the Claude API without having to re-upload content with each request.
- Skills API to create custom skills for Claude AI.
Use the Responses API to generate text responses with Claude models in Microsoft Foundry. For multi-language code samples that demonstrate this usage, see Use Claude Models with OpenAI Responses API in Microsoft Foundry.
Use the Messages API to work with Claude models
The following examples show how to use the Messages API to send requests to Claude Sonnet 4.5, by using both Microsoft Entra ID authentication and API key authentication methods. To work with your deployed model, you need these items:
- Your base URL, which is of the form
https://<resource name>.services.ai.azure.com/anthropic. - Your target URI from your deployment details, which is of the form
https://<resource name>.services.ai.azure.com/anthropic/v1/messages. - Microsoft Entra ID for keyless authentication or your deployment's API key for API authentication.
- Deployment name you chose during deployment creation. This name can be different from the model ID.
Use Microsoft Entra ID authentication
For Messages API endpoints, use your base URL with Microsoft Entra ID authentication.
Install the Azure Identity client library: You need to install this library to use the
DefaultAzureCredential. Authorization is easiest when you useDefaultAzureCredential, as it finds the best credential to use in its running environment.pip install azure.identitySet the values of the client ID, tenant ID, and client secret of the Microsoft Entra ID application as environment variables:
AZURE_CLIENT_ID,AZURE_TENANT_ID,AZURE_CLIENT_SECRET.export AZURE_CLIENT_ID="<AZURE_CLIENT_ID>" export AZURE_TENANT_ID="<AZURE_TENANT_ID>" export AZURE_CLIENT_SECRET="<AZURE_CLIENT_SECRET>"Install dependencies: Install the Anthropic SDK by using pip (requires: Python >=3.8).
pip install -U "anthropic"Run a basic code sample: This sample completes the following tasks:
- Creates a client with the Anthropic SDK, using Microsoft Entra ID authentication.
- Makes a basic call to the Messages API. The call is synchronous.
from anthropic import AnthropicFoundry from azure.identity import DefaultAzureCredential, get_bearer_token_provider baseURL = "https://<resource-name>.services.ai.azure.com/anthropic" # Your base URL. Replace <resource-name> with your resource name deploymentName = "claude-sonnet-4-5" # Replace with your deployment name # Create token provider for Entra ID authentication tokenProvider = get_bearer_token_provider( DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default" ) # Create client with Entra ID authentication client = AnthropicFoundry( azure_ad_token_provider=tokenProvider, base_url=baseURL ) # Send request message = client.messages.create( model=deployment_name, messages=[ {"role": "user", "content": "What is the capital/major city of France?"} ], max_tokens=1024, ) print(message.content)
Use API key authentication
For Messages API endpoints, use your base URL and API key to authenticate against the service.
Install dependencies: Install the Anthropic SDK by using pip (requires: Python >=3.8):
pip install -U "anthropic"Run a basic code sample: This sample completes the following tasks:
- Creates a client with the Anthropic SDK by passing your API key to the SDK's configuration. This authentication method lets you interact seamlessly with the service.
- Makes a basic call to the Messages API. The call is synchronous.
from anthropic import AnthropicFoundry baseURL = "https://<resource-name>.services.ai.azure.com/anthropic" # Your base URL. Replace <resource-name> with your resource name deploymentName = "claude-sonnet-4-5" # Replace with your deployment name apiKey = "YOUR_API_KEY" # Replace YOUR_API_KEY with your API key # Create client with API key authentication client = AnthropicFoundry( api_key=apiKey, base_url=baseURL ) # Send request message = client.messages.create( model=deploymentName, messages=[ {"role": "user", "content": "What is the capital/major city of France?"} ], max_tokens=1024, ) print(message.content)
Agent support
- Foundry Agent Service supports Claude models.
- Microsoft Agent Framework supports creating agents that use Claude models.
- You can build custom AI agents with the Claude Agent SDK.
Claude advanced features and capabilities
Claude in Foundry Models supports advanced features and capabilities. Core capabilities enhance Claude's fundamental abilities for processing, analyzing, and generating content across various formats and use cases. Tools enable Claude to interact with external systems, execute code, and perform automated tasks through various tool interfaces.
Some of the Core capabilities that Foundry supports are:
- 1 million token context window: An extended context window.
- Agent skills: Extend Claude's capabilities with Skills.
- Citations: Ground Claude's responses in source documents.
- Context editing: Automatically manage conversation context with configurable strategies.
- Extended thinking: Enhanced reasoning capabilities for complex tasks.
- PDF support: Process and analyze text and visual content from PDF documents.
- Prompt caching: Provide Claude with more background knowledge and example outputs to reduce costs and latency.
Some of the Tools that Foundry supports are:
- MCP connector: Connect to remote MCP servers directly from the Messages API without a separate MCP client.
- Memory: Store and retrieve information across conversations. Build knowledge bases over time, maintain project context, and learn from past interactions.
- Web fetch: Retrieve full content from specified web pages and PDF documents for in-depth analysis.
For a full list of the supported capabilities and tools, see Claude's features overview.
API quotas and limits
Claude models in Foundry have the following rate limits, measured in Tokens Per Minute (TPM) and Requests Per Minute (RPM):
| Model | Deployment Type | Default RPM | Default TPM | Enterprise and MCA-E RPM | Enterprise and MCA-E TPM |
|---|---|---|---|---|---|
| claude-haiku-4-5 | GlobalStandard | 1,000 | 1,000,000 | 4,000 | 4,000,000 |
| claude-opus-4-1 | GlobalStandard | 1,000 | 1,000,000 | 2,000 | 2,000,000 |
| claude-sonnet-4-5 | GlobalStandard | 1,000 | 1,000,000 | 4,000 | 2,000,000 |
| claude-opus-4-5 | Global Standard | 1,000 | 1,000,000 | 2,000 | 2,000,000 |
To increase your quota beyond the default limits, submit a request through the quota increase request form.
Rate limit best practices
To optimize your usage and avoid rate limiting:
- Implement retry logic: Handle 429 responses with exponential backoff
- Batch requests: Combine multiple prompts when possible
- Monitor usage: Track your token consumption and request patterns
- Use appropriate models: Choose the right Claude model for your use case
Responsible AI considerations
When using Claude models in Foundry, consider these responsible AI practices:
Configure AI content safety during model inference, as Foundry doesn't provide built-in content filtering for Claude models at deployment time. To learn how to create and use content filters, see Configure content filtering for Foundry Models.
Ensure your applications comply with Anthropic's Acceptable Use Policy. Also, see details of safety evaluations for Claude Opus 4.5, Claude Haiku 4.5, Claude Opus 4.1, and Claude Sonnet 4.5.
Configure AI content safety during model inference, as Foundry doesn't provide built-in content filtering for Claude models at deployment time.
Ensure your applications comply with Anthropic's Acceptable Use Policy. Also, see details of safety evaluations for Claude Opus 4.5, Claude Haiku 4.5, Claude Opus 4.1, and Claude Sonnet 4.5.
Best practices
Follow these best practices when working with Claude models in Foundry:
Model selection
Choose the appropriate Claude model based on your specific requirements:
- Claude Opus 4.5: For best performance across coding, agents, computer use, and enterprise workflows
- Claude Sonnet 4.5: For balanced performance and capabilities, production workflows
- Claude Haiku 4.5: For speed and cost optimization, high-volume processing
- Claude Opus 4.1: For complex reasoning and enterprise applications
Prompt engineering
- Clear instructions: Provide specific and detailed prompts
- Context management: Effectively use the available context window
- Role definitions: Use system messages to define the assistant's role and behavior
- Structured prompts: Use consistent formatting for better results
Cost optimization
- Token management: Monitor and optimize token usage
- Model selection: Use the most cost-effective model for your use case
- Caching: Implement explicit prompt caching where appropriate
- Request batching: Combine multiple requests when possible