你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn。
重要
- Foundry Local 以预览版形式提供。 通过公共预览版,可以提前访问正处于开发状态的功能。
- 正式发布 (GA) 之前,功能、方法和流程可能会发生更改或功能受限。
Foundry Local 通过本地 REST 服务器与其他 SDK(例如 OpenAI、Azure OpenAI 和 LangChain)集成。 本文介绍如何使用常用 SDK 将应用连接到本地 AI 模型。
先决条件
- 已安装 Python 3.9 或更高版本。 可以从 官方 Python 网站下载 Python。
安装 pip 包
安装以下 Python 包:
pip install openai
pip install foundry-local-sdk
小窍门
建议使用虚拟环境来避免包冲突。 可以使用 venv 或 conda 创建虚拟环境。
将 OpenAI SDK 与 Foundry Local 配合使用
以下示例演示如何将 OpenAI SDK 与 Foundry Local 配合使用。 该代码初始化 Foundry Local 服务,加载模型,并使用 OpenAI SDK 生成响应。
将以下代码复制并粘贴到名为 app.py“Python”的 Python 文件中:
import openai
from foundry_local import FoundryLocalManager
# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "qwen2.5-0.5b"
# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
# The remaining code uses the OpenAI Python SDK to interact with the local model.
# Configure the client to use the local Foundry service
client = openai.OpenAI(
base_url=manager.endpoint,
api_key=manager.api_key # API key is not required for local usage
)
# Set the model to use and generate a response
response = client.chat.completions.create(
model=manager.get_model_info(alias).id,
messages=[{"role": "user", "content": "What is the golden ratio?"}]
)
print(response.choices[0].message.content)
使用以下命令运行代码:
python app.py
流式处理响应
如果要接收流式处理响应,可以按如下所示修改代码:
import openai
from foundry_local import FoundryLocalManager
# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "qwen2.5-0.5b"
# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
# The remaining code us es the OpenAI Python SDK to interact with the local model.
# Configure the client to use the local Foundry service
client = openai.OpenAI(
base_url=manager.endpoint,
api_key=manager.api_key # API key is not required for local usage
)
# Set the model to use and generate a streaming response
stream = client.chat.completions.create(
model=manager.get_model_info(alias).id,
messages=[{"role": "user", "content": "What is the golden ratio?"}],
stream=True
)
# Print the streaming response
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
可以使用与之前相同的命令运行代码:
python app.py
将 requests 与 Foundry Local 配合使用
# Install with: pip install requests
import requests
import json
from foundry_local import FoundryLocalManager
# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "qwen2.5-0.5b"
# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
url = manager.endpoint + "/chat/completions"
payload = {
"model": manager.get_model_info(alias).id,
"messages": [
{"role": "user", "content": "What is the golden ratio?"}
]
}
headers = {
"Content-Type": "application/json"
}
response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json()["choices"][0]["message"]["content"])
先决条件
- Foundry Local 已安装并运行。 有关安装说明,请参阅 Foundry Local 入门。
- Node.js 18 或更高版本安装。
安装 Node.js 包
需要安装以下 Node.js 包:
npm install openai
npm install foundry-local-sdk
Foundry Local SDK 允许管理 Foundry 本地服务和模型。
将 OpenAI SDK 与 Foundry Local 配合使用
以下示例演示如何将 OpenAI SDK 与 Foundry Local 配合使用。 该代码初始化 Foundry Local 服务,加载模型,并使用 OpenAI SDK 生成响应。
将以下代码复制并粘贴到名为 app.js 的 JavaScript 文件中:
import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "qwen2.5-0.5b";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager();
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias);
console.log("Model Info:", modelInfo);
const openai = new OpenAI({
baseURL: foundryLocalManager.endpoint,
apiKey: foundryLocalManager.apiKey,
});
async function generateText() {
const response = await openai.chat.completions.create({
model: modelInfo.id,
messages: [
{
role: "user",
content: "What is the golden ratio?",
},
],
});
console.log(response.choices[0].message.content);
}
generateText();
使用以下命令运行代码:
node app.js
流式处理响应
如果要接收流式处理响应,可以按如下所示修改代码:
import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "qwen2.5-0.5b";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager();
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias);
console.log("Model Info:", modelInfo);
const openai = new OpenAI({
baseURL: foundryLocalManager.endpoint,
apiKey: foundryLocalManager.apiKey,
});
async function streamCompletion() {
const stream = await openai.chat.completions.create({
model: modelInfo.id,
messages: [{ role: "user", content: "What is the golden ratio?" }],
stream: true,
});
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
}
streamCompletion();
使用以下命令运行代码:
node app.js
将提取 API 与 Foundry Local 配合使用
如果您偏好使用像 fetch 这样的 HTTP 客户端,可以按如下方式进行。
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "qwen2.5-0.5b";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager();
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias);
console.log("Model Info:", modelInfo);
async function queryModel() {
const response = await fetch(
foundryLocalManager.endpoint + "/chat/completions",
{
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
model: modelInfo.id,
messages: [{ role: "user", content: "What is the golden ratio?" }],
}),
}
);
const data = await response.json();
console.log(data.choices[0].message.content);
}
queryModel();
流式处理响应
如果要使用提取 API 接收流式处理响应,可以按如下所示修改代码:
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "qwen2.5-0.5b";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager();
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias);
console.log("Model Info:", modelInfo);
async function streamWithFetch() {
const response = await fetch(
foundryLocalManager.endpoint + "/chat/completions",
{
method: "POST",
headers: {
"Content-Type": "application/json",
Accept: "text/event-stream",
},
body: JSON.stringify({
model: modelInfo.id,
messages: [{ role: "user", content: "what is the golden ratio?" }],
stream: true,
}),
}
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split("\n").filter((line) => line.trim() !== "");
for (const line of lines) {
if (line.startsWith("data: ")) {
const data = line.substring(6);
if (data === "[DONE]") continue;
try {
const json = JSON.parse(data);
const content = json.choices[0]?.delta?.content || "";
if (content) {
// Print to console without line breaks, similar to process.stdout.write
process.stdout.write(content);
}
} catch (e) {
console.error("Error parsing JSON:", e);
}
}
}
}
}
// Call the function to start streaming
streamWithFetch();
先决条件
- 已安装 .NET 8.0 SDK 或更高版本。
示例存储库
本文中的示例可在 Foundry 本地 C# SDK 示例 GitHub 存储库中找到。
启动项目
按照以下特定于 Windows 或跨平台(macOS/Linux/Windows)的说明,在 C# 项目中使用 Foundry Local:
- 创建新的 C# 项目并导航到它:
dotnet new console -n app-name cd app-name - 打开文件并将其
app-name.csproj编辑为:<Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>net9.0-windows10.0.26100</TargetFramework> <RootNamespace>app-name</RootNamespace> <ImplicitUsings>enable</ImplicitUsings> <Nullable>enable</Nullable> <WindowsAppSDKSelfContained>false</WindowsAppSDKSelfContained> <WindowsPackageType>None</WindowsPackageType> <EnableCoreMrtTooling>false</EnableCoreMrtTooling> </PropertyGroup> <ItemGroup> <PackageReference Include="Microsoft.AI.Foundry.Local.WinML" Version="0.8.2.1" /> <PackageReference Include="Microsoft.Extensions.Logging" Version="9.0.10" /> <PackageReference Include="OpenAI" Version="2.5.0" /> </ItemGroup> </Project> - 在项目根目录中创建一个
nuget.config文件,并使用以下内容以确保包正确还原:<?xml version="1.0" encoding="utf-8"?> <configuration> <packageSources> <clear /> <add key="nuget.org" value="https://api.nuget.org/v3/index.json" /> <add key="ORT" value="https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT/nuget/v3/index.json" /> </packageSources> <packageSourceMapping> <packageSource key="nuget.org"> <package pattern="*" /> </packageSource> <packageSource key="ORT"> <package pattern="*Foundry*" /> </packageSource> </packageSourceMapping> </configuration>
将 OpenAI SDK 与 Foundry Local 配合使用
以下示例演示如何将 OpenAI SDK 与 Foundry Local 配合使用。 生成的代码包含以下步骤:
使用包括 Web 服务配置的
FoundryLocalManager初始化Configuration实例。 Web 服务是一个符合 OpenAI 的接口。使用别名从模型目录中获取
Model对象。注释
Foundry Local 将根据主机的可用硬件自动选择模型的最佳变体。
下载并加载模型变体。
启动 Web 服务。
使用 OpenAI SDK 调用本地 Foundry Web 服务。
通过停止 Web 服务并卸载模型来更新模型。
将以下代码复制并粘贴到名为 Program.cs 的 C# 文件中。
using Microsoft.AI.Foundry.Local;
using Microsoft.Extensions.Logging;
using OpenAI;
using System.ClientModel;
var config = new Configuration
{
AppName = "app-name",
LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information,
Web = new Configuration.WebService
{
Urls = "http://127.0.0.1:55588"
}
};
using var loggerFactory = LoggerFactory.Create(builder =>
{
builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
});
var logger = loggerFactory.CreateLogger<Program>();
// Initialize the singleton instance.
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;
// Get the model catalog
var catalog = await mgr.GetCatalogAsync();
// Get a model using an alias
var model = await catalog.GetModelAsync("qwen2.5-0.5b") ?? throw new Exception("Model not found");
// Download the model (the method skips download if already cached)
await model.DownloadAsync(progress =>
{
Console.Write($"\rDownloading model: {progress:F2}%");
if (progress >= 100f)
{
Console.WriteLine();
}
});
// Load the model
await model.LoadAsync();
// Start the web service
await mgr.StartWebServiceAsync();
// <<<<<< OPEN AI SDK USAGE >>>>>>
// Use the OpenAI SDK to call the local Foundry web service
ApiKeyCredential key = new ApiKeyCredential("notneeded");
OpenAIClient client = new OpenAIClient(key, new OpenAIClientOptions
{
Endpoint = new Uri(config.Web.Urls + "/v1"),
});
var chatClient = client.GetChatClient(model.Id);
var completionUpdates = chatClient.CompleteChatStreaming("Why is the sky blue?");
Console.Write($"[ASSISTANT]: ");
foreach (var completionUpdate in completionUpdates)
{
if (completionUpdate.ContentUpdate.Count > 0)
{
Console.Write(completionUpdate.ContentUpdate[0].Text);
}
}
Console.WriteLine();
// <<<<<< END OPEN AI SDK USAGE >>>>>>
// Tidy up
// Stop the web service and unload model
await mgr.StopWebServiceAsync();
await model.UnloadAsync();
使用以下命令运行代码:
对于 x64 Windows,请使用以下命令:
dotnet run -r:win-x64
对于 arm64 Windows,请使用以下命令:
dotnet run -r:win-arm64
先决条件
- Foundry Local 已安装并运行。 有关安装说明,请参阅 Foundry Local 入门。
- 已安装 Rust 和 Cargo 。
创建项目
创建新的 Rust 项目并导航到它:
cargo new hello-foundry-local
cd hello-foundry-local
安装 crate
使用 Cargo 安装以下 Rust 库:
cargo add foundry-local anyhow env_logger serde_json
cargo add reqwest --features json
cargo add tokio --features full
更新main.rs文件
以下示例演示如何使用对 Foundry Local 服务的请求进行推理。 该代码初始化 Foundry Local 服务,加载模型,并使用库生成响应 reqwest 。
将以下代码复制并粘贴到名为 main.rsRust 文件中:
use foundry_local::FoundryLocalManager;
use anyhow::Result;
#[tokio::main]
async fn main() -> Result<()> {
// Create a FoundryLocalManager instance with default options
let mut manager = FoundryLocalManager::builder()
.alias_or_model_id("qwen2.5-0.5b") // Specify the model to use
.bootstrap(true) // Start the service if not running
.build()
.await?;
// Use the OpenAI compatible API to interact with the model
let client = reqwest::Client::new();
let endpoint = manager.endpoint()?;
let response = client.post(format!("{}/chat/completions", endpoint))
.header("Content-Type", "application/json")
.header("Authorization", format!("Bearer {}", manager.api_key()))
.json(&serde_json::json!({
"model": manager.get_model_info("qwen2.5-0.5b", true).await?.id,
"messages": [{"role": "user", "content": "What is the golden ratio?"}],
}))
.send()
.await?;
let result = response.json::<serde_json::Value>().await?;
println!("{}", result["choices"][0]["message"]["content"]);
Ok(())
}
使用以下命令运行代码:
cargo run