Edit

Share via


Use Foundry Local native chat completions API

Important

  • Foundry Local is available in preview. Public preview releases provide early access to features that are in active deployment.
  • Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).

The native chat completions API allows you to interact directly with Foundry Local's inference capabilities without needing to start a REST web server. The native API streamlines your application architecture by reducing dependencies and complexity. The native chat completions API uses the same input (request) and output (response) as the OpenAI SDK to ensure compatibility with existing applications and familiarity for developers.

This article explains how to use the native chat completions API in the Foundry Local SDK.

Prerequisites

Samples repository

The sample in this article can be found in the Foundry Local C# SDK Samples GitHub repository.

Set up project

Use Foundry Local in your C# project by following these Windows-specific or Cross-Platform (macOS/Linux/Windows) instructions:

  1. Create a new C# project and navigate into it:
    dotnet new console -n app-name
    cd app-name
    
  2. Open and edit the app-name.csproj file to:
    <Project Sdk="Microsoft.NET.Sdk">
    
      <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFramework>net9.0-windows10.0.26100</TargetFramework>
        <RootNamespace>app-name</RootNamespace>
        <ImplicitUsings>enable</ImplicitUsings>
        <Nullable>enable</Nullable>
        <WindowsAppSDKSelfContained>false</WindowsAppSDKSelfContained>
        <WindowsPackageType>None</WindowsPackageType>
        <EnableCoreMrtTooling>false</EnableCoreMrtTooling>
      </PropertyGroup>
    
      <ItemGroup>
        <PackageReference Include="Microsoft.AI.Foundry.Local.WinML" Version="0.8.2.1" />
        <PackageReference Include="Microsoft.Extensions.Logging" Version="9.0.10" />
        <PackageReference Include="OpenAI" Version="2.5.0" />
      </ItemGroup>
    
    </Project>
    
  3. Create a nuget.config file in the project root with the following content so that the packages restore correctly:
    <?xml version="1.0" encoding="utf-8"?>
    <configuration>
      <packageSources>
        <clear />
        <add key="nuget.org" value="https://api.nuget.org/v3/index.json" />
        <add key="ORT" value="https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT/nuget/v3/index.json" />
      </packageSources>
      <packageSourceMapping>
        <packageSource key="nuget.org">
          <package pattern="*" />
        </packageSource>
        <packageSource key="ORT">
          <package pattern="*Foundry*" />
        </packageSource>
      </packageSourceMapping>
    </configuration>
    

Use native chat completions API

The following example demonstrates how to use the native chat completions API in Foundry Local. The code includes the following steps:

  1. Initializes a FoundryLocalManager instance with a Configuration.

  2. Gets a Model object from the model catalog using an alias.

    Note

    Foundry Local selects the best variant for the model automatically based on the available hardware of the host machine.

  3. Downloads and loads the model variant.

  4. Uses the native chat completions API to generate a response.

  5. Unloads the model.

Copy-and-paste the following code into a C# file named Program.cs:

using Microsoft.AI.Foundry.Local;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
using Microsoft.Extensions.Logging;

CancellationToken ct = new CancellationToken();

var config = new Configuration
{
    AppName = "app-name",
    LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Debug
};

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Debug);
});
var logger = loggerFactory.CreateLogger<Program>();

// Initialize the singleton instance.
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;

// Get the model catalog
var catalog = await mgr.GetCatalogAsync();

// List available models
Console.WriteLine("Available models for your hardware:");
var models = await catalog.ListModelsAsync();
foreach (var availableModel in models)
{
    foreach (var variant in availableModel.Variants)
    {
        Console.WriteLine($"  - Alias: {variant.Alias} (Id: {string.Join(", ", variant.Id)})");
    }
}

// Get a model using an alias
var model = await catalog.GetModelAsync("qwen2.5-0.5b") ?? throw new Exception("Model not found");

// is model cached
Console.WriteLine($"Is model cached: {await model.IsCachedAsync()}");

// print out cached models
var cachedModels = await catalog.GetCachedModelsAsync();
Console.WriteLine("Cached models:");
foreach (var cachedModel in cachedModels)
{
    Console.WriteLine($"- {cachedModel.Alias} ({cachedModel.Id})");
}

// Download the model (the method skips download if already cached)
await model.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading model: {progress:F2}%");
    if (progress >= 100f)
    {
        Console.WriteLine();
    }
});

// Load the model
await model.LoadAsync();

// Get a chat client
var chatClient = await model.GetChatClientAsync();

// Create a chat message
List<ChatMessage> messages = new()
{
    new ChatMessage { Role = "user", Content = "Why is the sky blue?" }
};


var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
await foreach (var chunk in streamingResponse)
{
    Console.Write(chunk.Choices[0].Message.Content);
    Console.Out.Flush();
}
Console.WriteLine();

// Tidy up - unload the model
await model.UnloadAsync();

Run the code using the following command:

For x64 Windows, use the following command:

dotnet run -r:win-x64

For arm64 Windows, use the following command:

dotnet run -r:win-arm64