Edit

Share via


Deploy and use Claude models in Microsoft Foundry (preview)

This article explains how to deploy and use the latest Claude models in Foundry, including Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5, and Claude Opus 4.1. Anthropic's flagship product is Claude, a frontier AI model useful for complex tasks such as coding, agents, financial analysis, research, and office tasks. Claude delivers exceptional performance while maintaining high safety standards.

Available Claude models

Foundry supports Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5, and Claude Opus 4.1 models through global standard deployment. These models have key capabilities that include:

  • Extended thinking: Extended thinking gives Claude enhanced reasoning capabilities for complex tasks.
  • Image and text input: Strong vision capabilities that enable the models to process images and return text outputs for analyzing and understanding charts, graphs, technical diagrams, reports, and other visual assets.
  • Code generation: Advanced thinking that includes code generation, analysis, and debugging for Claude Sonnet 4.5 and Claude Opus 4.1.

For more details about the model capabilities, see capabilities of Claude models.

Claude Opus 4.5 (preview)

Claude Opus 4.5 is Anthropic's most intelligent model, and an industry leader across coding, agents, computer use, and enterprise workflows. With a 200K token context window and 64K max output, Opus 4.5 is ideal for production code, sophisticated agents, office tasks, financial analysis, cybersecurity, and computer use.

Claude Sonnet 4.5 (preview)

Claude Sonnet 4.5 is a highly capable model designed for building real-world agents and handling complex, long-horizon tasks. It offers a strong balance of speed and cost for high-volume use cases. Sonnet 4.5 also provides advanced accuracy for computer use, enabling developers to direct Claude to use computers the way people do.

Claude Haiku 4.5 (preview)

Claude Haiku 4.5 delivers near-frontier performance for a wide range of use cases. It stands out as one of the best coding and agent models, with the right speed and cost to power free products and scaled sub-agents.

Claude Opus 4.1 (preview)

Claude Opus 4.1 is an industry leader for coding. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.

Prerequisites

Deploy Claude models

Claude models in Foundry are available for global standard deployment. To deploy a Claude model, follow the instructions in Add and configure models to Microsoft Foundry Models.

After deployment, you can use the Foundry playground to interactively test the model.

Work with Claude models

Once deployed, you have some options for interacting with Claude models to generate text responses:

Use the Messages API to work with Claude models

The following examples show how to use the Messages API to send requests to Claude Sonnet 4.5, by using both Microsoft Entra ID authentication and API key authentication methods. To work with your deployed model, you need these items:

  • Your base URL, which is of the form https://<resource name>.services.ai.azure.com/anthropic.
  • Your target URI from your deployment details, which is of the form https://<resource name>.services.ai.azure.com/anthropic/v1/messages.
  • Microsoft Entra ID for keyless authentication or your deployment's API key for API authentication.
  • Deployment name you chose during deployment creation. This name can be different from the model ID.

Use Microsoft Entra ID authentication

For Messages API endpoints, use your base URL with Microsoft Entra ID authentication.

  1. Install the Azure Identity client library: You need to install this library to use the DefaultAzureCredential. Authorization is easiest when you use DefaultAzureCredential, as it finds the best credential to use in its running environment.

    pip install azure.identity
    

    Set the values of the client ID, tenant ID, and client secret of the Microsoft Entra ID application as environment variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET.

    export AZURE_CLIENT_ID="<AZURE_CLIENT_ID>"
    export AZURE_TENANT_ID="<AZURE_TENANT_ID>"
    export AZURE_CLIENT_SECRET="<AZURE_CLIENT_SECRET>"
    
  2. Install dependencies: Install the Anthropic SDK by using pip (requires: Python >=3.8).

    pip install -U "anthropic"
    
  3. Run a basic code sample: This sample completes the following tasks:

    1. Creates a client with the Anthropic SDK, using Microsoft Entra ID authentication.
    2. Makes a basic call to the Messages API. The call is synchronous.
    from anthropic import AnthropicFoundry
    from azure.identity import DefaultAzureCredential, get_bearer_token_provider
    
    baseURL = "https://<resource-name>.services.ai.azure.com/anthropic" # Your base URL. Replace <resource-name> with your resource name
    deploymentName = "claude-sonnet-4-5" # Replace with your deployment name
    
    # Create token provider for Entra ID authentication
    tokenProvider = get_bearer_token_provider(
        DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
    )
    
    # Create client with Entra ID authentication
    client = AnthropicFoundry(
        azure_ad_token_provider=tokenProvider,
        base_url=baseURL
    )
    
    # Send request
    message = client.messages.create(
        model=deployment_name,
        messages=[
            {"role": "user", "content": "What is the capital/major city of France?"}
        ],
        max_tokens=1024,
    )
    
    print(message.content)
    

Use API key authentication

For Messages API endpoints, use your base URL and API key to authenticate against the service.

  1. Install dependencies: Install the Anthropic SDK by using pip (requires: Python >=3.8):

    pip install -U "anthropic"
    
  2. Run a basic code sample: This sample completes the following tasks:

    1. Creates a client with the Anthropic SDK by passing your API key to the SDK's configuration. This authentication method lets you interact seamlessly with the service.
    2. Makes a basic call to the Messages API. The call is synchronous.
    from anthropic import AnthropicFoundry
    
    baseURL = "https://<resource-name>.services.ai.azure.com/anthropic" # Your base URL. Replace <resource-name> with your resource name
    deploymentName = "claude-sonnet-4-5" # Replace with your deployment name
    apiKey = "YOUR_API_KEY" # Replace YOUR_API_KEY with your API key
    
    # Create client with API key authentication
    client = AnthropicFoundry(
        api_key=apiKey,
        base_url=baseURL
    )
    
    # Send request
    message = client.messages.create(
        model=deploymentName,
        messages=[
            {"role": "user", "content": "What is the capital/major city of France?"}
        ],
        max_tokens=1024,
    )
    
    print(message.content)
    

Agent support

Claude advanced features and capabilities

Claude in Foundry Models supports advanced features and capabilities. Core capabilities enhance Claude's fundamental abilities for processing, analyzing, and generating content across various formats and use cases. Tools enable Claude to interact with external systems, execute code, and perform automated tasks through various tool interfaces.

Some of the Core capabilities that Foundry supports are:

  • 1 million token context window: An extended context window.
  • Agent skills: Extend Claude's capabilities with Skills.
  • Citations: Ground Claude's responses in source documents.
  • Context editing: Automatically manage conversation context with configurable strategies.
  • Extended thinking: Enhanced reasoning capabilities for complex tasks.
  • PDF support: Process and analyze text and visual content from PDF documents.
  • Prompt caching: Provide Claude with more background knowledge and example outputs to reduce costs and latency.

Some of the Tools that Foundry supports are:

  • MCP connector: Connect to remote MCP servers directly from the Messages API without a separate MCP client.
  • Memory: Store and retrieve information across conversations. Build knowledge bases over time, maintain project context, and learn from past interactions.
  • Web fetch: Retrieve full content from specified web pages and PDF documents for in-depth analysis.

For a full list of the supported capabilities and tools, see Claude's features overview.

API quotas and limits

Claude models in Foundry have the following rate limits, measured in Tokens Per Minute (TPM) and Requests Per Minute (RPM):

Model Deployment Type Default RPM Default TPM Enterprise and MCA-E RPM Enterprise and MCA-E TPM
claude-haiku-4-5 GlobalStandard 1,000 1,000,000 4,000 4,000,000
claude-opus-4-1 GlobalStandard 1,000 1,000,000 2,000 2,000,000
claude-sonnet-4-5 GlobalStandard 1,000 1,000,000 4,000 2,000,000
claude-opus-4-5 Global Standard 1,000 1,000,000 2,000 2,000,000

To increase your quota beyond the default limits, submit a request through the quota increase request form.

Rate limit best practices

To optimize your usage and avoid rate limiting:

  • Implement retry logic: Handle 429 responses with exponential backoff
  • Batch requests: Combine multiple prompts when possible
  • Monitor usage: Track your token consumption and request patterns
  • Use appropriate models: Choose the right Claude model for your use case

Responsible AI considerations

When using Claude models in Foundry, consider these responsible AI practices:

Best practices

Follow these best practices when working with Claude models in Foundry:

Model selection

Choose the appropriate Claude model based on your specific requirements:

  • Claude Opus 4.5: For best performance across coding, agents, computer use, and enterprise workflows
  • Claude Sonnet 4.5: For balanced performance and capabilities, production workflows
  • Claude Haiku 4.5: For speed and cost optimization, high-volume processing
  • Claude Opus 4.1: For complex reasoning and enterprise applications

Prompt engineering

  • Clear instructions: Provide specific and detailed prompts
  • Context management: Effectively use the available context window
  • Role definitions: Use system messages to define the assistant's role and behavior
  • Structured prompts: Use consistent formatting for better results

Cost optimization

  • Token management: Monitor and optimize token usage
  • Model selection: Use the most cost-effective model for your use case
  • Caching: Implement explicit prompt caching where appropriate
  • Request batching: Combine multiple requests when possible