Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
- Foundry Local is available in preview. Public preview releases provide early access to features that are in active deployment.
- Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).
The native chat completions API allows you to interact directly with Foundry Local's inference capabilities without needing to start a REST web server. The native API streamlines your application architecture by reducing dependencies and complexity. The native chat completions API uses the same input (request) and output (response) as the OpenAI SDK to ensure compatibility with existing applications and familiarity for developers.
This article explains how to use the native chat completions API in the Foundry Local SDK.
Prerequisites
- .NET 8.0 SDK or later installed.
Samples repository
The sample in this article can be found in the Foundry Local C# SDK Samples GitHub repository.
Set up project
Use Foundry Local in your C# project by following these Windows-specific or Cross-Platform (macOS/Linux/Windows) instructions:
- Create a new C# project and navigate into it:
dotnet new console -n app-name cd app-name - Open and edit the
app-name.csprojfile to:<Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>net9.0-windows10.0.26100</TargetFramework> <RootNamespace>app-name</RootNamespace> <ImplicitUsings>enable</ImplicitUsings> <Nullable>enable</Nullable> <WindowsAppSDKSelfContained>false</WindowsAppSDKSelfContained> <WindowsPackageType>None</WindowsPackageType> <EnableCoreMrtTooling>false</EnableCoreMrtTooling> </PropertyGroup> <ItemGroup> <PackageReference Include="Microsoft.AI.Foundry.Local.WinML" Version="0.8.2.1" /> <PackageReference Include="Microsoft.Extensions.Logging" Version="9.0.10" /> <PackageReference Include="OpenAI" Version="2.5.0" /> </ItemGroup> </Project> - Create a
nuget.configfile in the project root with the following content so that the packages restore correctly:<?xml version="1.0" encoding="utf-8"?> <configuration> <packageSources> <clear /> <add key="nuget.org" value="https://api.nuget.org/v3/index.json" /> <add key="ORT" value="https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT/nuget/v3/index.json" /> </packageSources> <packageSourceMapping> <packageSource key="nuget.org"> <package pattern="*" /> </packageSource> <packageSource key="ORT"> <package pattern="*Foundry*" /> </packageSource> </packageSourceMapping> </configuration>
Use native chat completions API
The following example demonstrates how to use the native chat completions API in Foundry Local. The code includes the following steps:
Initializes a
FoundryLocalManagerinstance with aConfiguration.Gets a
Modelobject from the model catalog using an alias.Note
Foundry Local selects the best variant for the model automatically based on the available hardware of the host machine.
Downloads and loads the model variant.
Uses the native chat completions API to generate a response.
Unloads the model.
Copy-and-paste the following code into a C# file named Program.cs:
using Microsoft.AI.Foundry.Local;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
using Microsoft.Extensions.Logging;
CancellationToken ct = new CancellationToken();
var config = new Configuration
{
AppName = "app-name",
LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Debug
};
using var loggerFactory = LoggerFactory.Create(builder =>
{
builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Debug);
});
var logger = loggerFactory.CreateLogger<Program>();
// Initialize the singleton instance.
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;
// Get the model catalog
var catalog = await mgr.GetCatalogAsync();
// List available models
Console.WriteLine("Available models for your hardware:");
var models = await catalog.ListModelsAsync();
foreach (var availableModel in models)
{
foreach (var variant in availableModel.Variants)
{
Console.WriteLine($" - Alias: {variant.Alias} (Id: {string.Join(", ", variant.Id)})");
}
}
// Get a model using an alias
var model = await catalog.GetModelAsync("qwen2.5-0.5b") ?? throw new Exception("Model not found");
// is model cached
Console.WriteLine($"Is model cached: {await model.IsCachedAsync()}");
// print out cached models
var cachedModels = await catalog.GetCachedModelsAsync();
Console.WriteLine("Cached models:");
foreach (var cachedModel in cachedModels)
{
Console.WriteLine($"- {cachedModel.Alias} ({cachedModel.Id})");
}
// Download the model (the method skips download if already cached)
await model.DownloadAsync(progress =>
{
Console.Write($"\rDownloading model: {progress:F2}%");
if (progress >= 100f)
{
Console.WriteLine();
}
});
// Load the model
await model.LoadAsync();
// Get a chat client
var chatClient = await model.GetChatClientAsync();
// Create a chat message
List<ChatMessage> messages = new()
{
new ChatMessage { Role = "user", Content = "Why is the sky blue?" }
};
var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
await foreach (var chunk in streamingResponse)
{
Console.Write(chunk.Choices[0].Message.Content);
Console.Out.Flush();
}
Console.WriteLine();
// Tidy up - unload the model
await model.UnloadAsync();
Run the code using the following command:
For x64 Windows, use the following command:
dotnet run -r:win-x64
For arm64 Windows, use the following command:
dotnet run -r:win-arm64