你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

适用于语音和音频的 GPT 实时 API

适用于语音和音频的 Azure OpenAI GPT 实时 API 是 GPT-4o 模型系列的一部分，该系列支持低延迟的“语音传入，语音传出”对话交互。

可以通过 WebRTC 或 WebSocket 使用实时 API 将音频输入发送到模型并实时接收音频响应。

按照本文中的说明通过 WebSocket 开始使用实时 API。在不需要低延迟的服务器到服务器方案中通过 WebSocket 使用实时 API。

小窍门

在大多数情况下，我们建议通过 WebRTC 使用实时 API 在客户端应用程序（如 Web 应用程序或移动应用）中实时音频流式处理。 WebRTC 专为低延迟、实时音频流式处理而设计，最适合大多数用例。

支持的模型

GPT 实时模型可用于全局部署。

gpt-4o-realtime-preview（版本 2024-12-17）
gpt-4o-mini-realtime-preview（版本 2024-12-17）
gpt-realtime（版本 2025-08-28）
gpt-realtime-mini（版本 2025-10-06）

有关详细信息，请参阅模型和版本文档。

API 支持

首次在 API 版本 2024-10-01-preview （已停用）中添加对实时 API 的支持。使用版本 2025-08-28 访问最新的实时 API 功能。建议尽可能选择正式版 API 版本（不含“-preview”后缀）。

谨慎

需要为预览版和正式版（GA）模型使用不同的终结点格式。本文中的所有示例都使用 GA 模型和 GA 终结点格式，并且不要使用 api-version 参数，这仅适用于预览终结点格式。请参阅本文中有关终结点格式的详细信息。

先决条件

Azure 订阅 - 免费创建订阅
Node.js LTS 或 ESM 支持。
在受支持的区域之一中创建的 Azure OpenAI 资源。有关区域可用性的详细信息，请参阅模型和版本文档。
然后，你需要使用 Azure OpenAI 资源部署 gpt-realtime 模型。有关详细信息，请参阅使用 Azure OpenAI 创建资源和部署模型。

Microsoft Entra ID 先决条件

若要使用 Microsoft Entra ID 进行推荐的无密钥身份验证，你需要：

安装使用 Microsoft Entra ID 进行无密钥身份验证所需的 Azure CLI。
将Cognitive Services OpenAI User角色分配给用户帐户。你可以在 Azure 门户的“访问控制(IAM)”“添加角色分配”下分配角色。>

为实时音频部署模型

若要在 Microsoft Foundry 门户中部署 gpt-realtime 模型，请执行以下作：

转到 Foundry 门户，创建或选择项目。
选择模型部署：
1. 对于 Azure OpenAI 资源，请在左窗格中选择“共享资源”部分中的“部署”。
2. 对于 Foundry 资源，请从左窗格中的“我的资产”下选择“模型 + 终结点”。
选择 “+ 部署模型>部署基本模型 ”以打开部署窗口。
搜索并选择 gpt-realtime 模型，然后选择“确认”。
查看部署详细信息，然后选择“ 部署”。
按照向导完成部署模型的步骤。

有了 gpt-realtime 模型的部署后，可以在 Foundry 门户的音频测试环境或实时 API 中与之交互。

设置

创建新文件夹 realtime-audio-quickstart-js，并使用以下命令转到快速入门文件夹：
```
mkdir realtime-audio-quickstart-js && cd realtime-audio-quickstart-js
```
使用以下命令创建 package.json：
```
npm init -y
```
使用以下命令在type中将module更新为package.json。
```
npm pkg set type=module
```
使用以下命令安装适用于 JavaScript 的 OpenAI 客户端库：
```
npm install openai
```
使用以下命令安装 OpenAI 客户端库用于 JavaScript 的依赖包：
```
npm install ws
```
若要使用 Microsoft Entra ID 进行推荐的无密钥身份验证，请使用以下命令安装包：@azure/identity
```
npm install @azure/identity
```

检索资源信息

需要检索以下信息才能使用 Azure OpenAI 资源对应用程序进行身份验证：

Microsoft Entra ID
API 密钥

变量名称	价值
`AZURE_OPENAI_ENDPOINT`	从 Azure 门户检查资源时，可在“密钥和终结点”部分中找到此值。
`AZURE_OPENAI_DEPLOYMENT_NAME`	此值将对应于在部署模型时为部署选择的自定义名称。 Azure 门户中的“资源管理”“模型部署”下提供了此值。>

详细了解无密钥身份验证，以及如何设置环境变量。

变量名称	价值
`AZURE_OPENAI_ENDPOINT`	从 Azure 门户检查资源时，可在“密钥和终结点”部分中找到此值。
`AZURE_OPENAI_API_KEY`	从 Azure 门户检查资源时，可在“密钥和终结点”部分中找到此值。可以使用 `KEY1` 或 `KEY2`。
`AZURE_OPENAI_DEPLOYMENT_NAME`	此值将对应于在部署模型时为部署选择的自定义名称。 Azure 门户中的“资源管理”“模型部署”下提供了此值。>

详细了解如何查找 API 密钥，以及如何设置环境变量。

重要

请谨慎使用 API 密钥。请不要直接在代码中包含 API 密钥，并且切勿公开发布该密钥。如果使用 API 密钥，请将其安全地存储在 Azure Key Vault 中。若要详细了解如何在应用中安全地使用 API 密钥，请参阅 API 密钥与 Azure 密钥保管库。

有关 AI 服务安全性的详细信息，请参阅对 Azure AI 服务的请求进行身份验证。

谨慎

若要对 SDK 使用推荐的无密钥身份验证，请确保未设置 AZURE_OPENAI_API_KEY 环境变量。

使用以下代码创建 index.js 文件：

import OpenAI from 'openai';
import { OpenAIRealtimeWS } from 'openai/realtime/ws';
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
import { OpenAIRealtimeError } from 'openai/realtime/internal-base';

let isCreated = false;
let isConfigured = false;
let responseDone = false;

// Set this to false, if you want to continue receiving events after an error is received.
const throwOnError = true;

async function main() {
    // The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
    // environment variable or replace the default value below.
    // You can find it in the Microsoft Foundry portal in the Overview page of your Azure OpenAI resource.
    // Example: https://{your-resource}.openai.azure.com
    const endpoint = process.env.AZURE_OPENAI_ENDPOINT || 'AZURE_OPENAI_ENDPOINT';
    const baseUrl = endpoint.replace(/\/$/, "") + '/openai/v1';

    // The deployment name of your Azure OpenAI model is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
    // Example: gpt-realtime
    const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt-realtime';

    // Keyless authentication
    const credential = new DefaultAzureCredential();
    const scope = 'https://cognitiveservices.azure.com/.default';
    const azureADTokenProvider = getBearerTokenProvider(credential, scope);
    const token = await azureADTokenProvider();

    // The APIs are compatible with the OpenAI client library.
    // You can use the OpenAI client library to access the Azure OpenAI APIs.
    // Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
    const openAIClient = new OpenAI({
        baseURL: baseUrl,
        apiKey: token,
    });
    const realtimeClient = await OpenAIRealtimeWS.create(openAIClient, {
        model: deploymentName
    });

    realtimeClient.on('error', (receivedError) => receiveError(receivedError));
    realtimeClient.on('session.created', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('session.updated', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio_transcript.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.done', (receivedEvent) => receiveEvent(receivedEvent));

    console.log('Waiting for events...');
    while (!isCreated) {
        console.log('Waiting for session.created event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is created, configure it to enable audio input and output.
    const sessionConfig = {
        'type': 'realtime',
        'instructions': 'You are a helpful assistant. You respond by voice and text.',
        'output_modalities': ['audio'],
        'audio': {
            'input': {
                'transcription': {
                    'model': 'whisper-1'
                },
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                },
                'turn_detection': {
                    'type': 'server_vad',
                    'threshold': 0.5,
                    'prefix_padding_ms': 300,
                    'silence_duration_ms': 200,
                    'create_response': true
                }
            },
            'output': {
                'voice': 'alloy',
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                }
            }
        }
    };

    realtimeClient.send({
        'type': 'session.update',
        'session': sessionConfig
    });
    while (!isConfigured) {
        console.log('Waiting for session.updated event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is configured, data can be sent to the session.    
    realtimeClient.send({
        'type': 'conversation.item.create',
        'item': {
            'type': 'message',
            'role': 'user',
            'content': [{
                type: 'input_text',
                text: 'Please assist the user.'
            }
            ]
        }
    });

    realtimeClient.send({
        type: 'response.create'
    });



    // While waiting for the session to finish, the events can be handled in the event handlers.
    // In this example, we just wait for the first response.done event.
    while (!responseDone) {
        console.log('Waiting for response.done event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    console.log('The sample completed successfully.');
    realtimeClient.close();
}

function receiveError(err) {
    if (err instanceof OpenAIRealtimeError) {
        console.error('Received an error event.');
        console.error(`Message: ${err.cause.message}`);
        console.error(`Stack: ${err.cause.stack}`);
    }

    if (throwOnError) {
        throw err;
    }
}

function receiveEvent(event) {
    console.log(`Received an event: ${event.type}`);

    switch (event.type) {
        case 'session.created':
            console.log(`Session ID: ${event.session.id}`);
            isCreated = true;
            break;
        case 'session.updated':
            console.log(`Session ID: ${event.session.id}`);
            isConfigured = true;
            break;
        case 'response.output_audio_transcript.delta':
            console.log(`Transcript delta: ${event.delta}`);
            break;
        case 'response.output_audio.delta':
            let audioBuffer = Buffer.from(event.delta, 'base64');
            console.log(`Audio delta length: ${audioBuffer.length} bytes`);
            break;
        case 'response.done':
            console.log(`Response ID: ${event.response.id}`);
            console.log(`The final response is: ${event.response.output[0].content[0].transcript}`);
            responseDone = true;
            break;
        default:
            console.warn(`Unhandled event type: ${event.type}`);
    }
}

main().catch((err) => {
    console.error('The sample encountered an error:', err);
});
export {
    main
};

使用以下命令登录到 Azure：
```
az login
```
运行 JavaScript 文件。
```
node index.js
```

使用以下代码创建 index.js 文件：

import OpenAI from 'openai';
import { OpenAIRealtimeWS } from 'openai/realtime/ws';
import { OpenAIRealtimeError } from 'openai/realtime/internal-base';

let isCreated = false;
let isConfigured = false;
let responseDone = false;

// Set this to false, if you want to continue receiving events after an error is received.
const throwOnError = true;

async function main() {
    // The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
    // Example: https://{your-resource}.openai.azure.com
    const endpoint = process.env.AZURE_OPENAI_ENDPOINT || 'AZURE_OPENAI_ENDPOINT';
    const baseUrl = endpoint.replace(/\/$/, "") + '/openai/v1';

    // The deployment name of your Azure OpenAI model is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
    // Example: gpt-realtime
    const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt-realtime';

    // API Key of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_API_KEY
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
    const token = process.env.AZURE_OPENAI_API_KEY || '<Your API Key>';

    // The APIs are compatible with the OpenAI client library.
    // You can use the OpenAI client library to access the Azure OpenAI APIs.
    // Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
    const openAIClient = new OpenAI({
        baseURL: baseUrl,
        apiKey: token,
    });

    // Due to the current SDK limitation we need to explicitly
    // pass API key as Header
    const realtimeClient = await OpenAIRealtimeWS.create(
        openAIClient, {
        model: deploymentName,
        options: {
            headers: {
                "api-key": token
            }
        }
    });

    realtimeClient.on('error', (receivedError) => receiveError(receivedError));
    realtimeClient.on('session.created', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('session.updated', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio_transcript.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.done', (receivedEvent) => receiveEvent(receivedEvent));

    console.log('Waiting for events...');
    while (!isCreated) {
        console.log('Waiting for session.created event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is created, configure it to enable audio input and output.
    const sessionConfig = {
        'type': 'realtime',
        'instructions': 'You are a helpful assistant. You respond by voice and text.',
        'output_modalities': ['audio'],
        'audio': {
            'input': {
                'transcription': {
                    'model': 'whisper-1'
                },
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                },
                'turn_detection': {
                    'type': 'server_vad',
                    'threshold': 0.5,
                    'prefix_padding_ms': 300,
                    'silence_duration_ms': 200,
                    'create_response': true
                }
            },
            'output': {
                'voice': 'alloy',
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                }
            }
        }
    };

    realtimeClient.send({
        'type': 'session.update',
        'session': sessionConfig
    });
    while (!isConfigured) {
        console.log('Waiting for session.updated event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is configured, data can be sent to the session.
    realtimeClient.send({
        'type': 'conversation.item.create',
        'item': {
            'type': 'message',
            'role': 'user',
            'content': [{
                type: 'input_text',
                text: 'Please assist the user.'
            }
            ]
        }
    });

    realtimeClient.send({
        type: 'response.create'
    });

    // While waiting for the session to finish, the events can be handled in the event handlers.
    // In this example, we just wait for the first response.done event.
    while (!responseDone) {
        console.log('Waiting for response.done event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    console.log('The sample completed successfully.');
    realtimeClient.close();
}

function receiveError(err) {
    if (err instanceof OpenAIRealtimeError) {
        console.error('Received an error event.');
        console.error(`Message: ${err.cause.message}`);
        console.error(`Stack: ${err.cause.stack}`);
    }

    if (throwOnError) {
        throw err;
    }
}

function receiveEvent(event) {
    console.log(`Received an event: ${event.type}`);

    switch (event.type) {
        case 'session.created':
            console.log(`Session ID: ${event.session.id}`);
            isCreated = true;
            break;
        case 'session.updated':
            console.log(`Session ID: ${event.session.id}`);
            isConfigured = true;
            break;
        case 'response.output_audio_transcript.delta':
            console.log(`Transcript delta: ${event.delta}`);
            break;
        case 'response.output_audio.delta':
            let audioBuffer = Buffer.from(event.delta, 'base64');
            console.log(`Audio delta length: ${audioBuffer.length} bytes`);
            break;
        case 'response.done':
            console.log(`Response ID: ${event.response.id}`);
            console.log(`The final response is: ${event.response.output[0].content[0].transcript}`);
            responseDone = true;
            break;
        default:
            console.warn(`Unhandled event type: ${event.type}`);
    }
}

main().catch((err) => {
    console.error('The sample encountered an error:', err);
});
export {
    main
};

运行 JavaScript 文件。
```
node index.js
```

片刻之后即可获得响应。

输出

该脚本将从模型获取响应，并打印收到的脚本和音频数据。

输出将类似于以下内容：

Waiting for events...
Waiting for session.created event...
Received an event: session.created
Session ID: sess_CQx8YO3vKxD9FaPxrbQ9R
Waiting for session.updated event...
Received an event: session.updated
Session ID: sess_CQx8YO3vKxD9FaPxrbQ9R
Waiting for response.done event...
Waiting for response.done event...
Waiting for response.done event...
Received an event: response.output_audio_transcript.delta
Transcript delta: Sure
Received an event: response.output_audio_transcript.delta
Transcript delta: ,
Received an event: response.output_audio_transcript.delta
Transcript delta:  I
Waiting for response.done event...
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 4800 bytes
Received an event: response.output_audio.delta
Audio delta length: 7200 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta: 'm
Received an event: response.output_audio_transcript.delta
Transcript delta:  here
Received an event: response.output_audio_transcript.delta
Transcript delta:  to
Received an event: response.output_audio_transcript.delta
Transcript delta:  help
Received an event: response.output_audio_transcript.delta
Transcript delta: .
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta:  What
Received an event: response.output_audio_transcript.delta
Transcript delta:  do
Received an event: response.output_audio_transcript.delta
Transcript delta:  you
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta:  need
Received an event: response.output_audio_transcript.delta
Transcript delta:  assistance
Received an event: response.output_audio_transcript.delta
Transcript delta:  with
Received an event: response.output_audio_transcript.delta
Transcript delta: ?
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 28800 bytes
Received an event: response.done
Response ID: resp_CQx8YwQCszDqSUXRutxP9
The final response is: Sure, I'm here to help. What do you need assistance with?
The sample completed successfully.

先决条件

一份 Azure 订阅。免费创建一个。
Python 3.8 或更高版本。建议使用 Python 3.10 或更高版本，但至少需要 Python 3.8。如果未安装合适的 Python 版本，则可以按照 VS Code Python 教程中的说明操作，这是在操作系统上安装 Python 的最简单方法。
在受支持的区域之一中创建的 Azure OpenAI 资源。有关区域可用性的详细信息，请参阅模型和版本文档。
然后，需要使用 Azure OpenAI 资源部署 gpt-realtime 或 gpt-realtime-mini 模型。有关详细信息，请参阅使用 Azure OpenAI 创建资源和部署模型。

Microsoft Entra ID 先决条件

若要使用 Microsoft Entra ID 进行推荐的无密钥身份验证，你需要：

安装使用 Microsoft Entra ID 进行无密钥身份验证所需的 Azure CLI。
将Cognitive Services OpenAI User角色分配给用户帐户。你可以在 Azure 门户的“访问控制(IAM)”“添加角色分配”下分配角色。>

为实时音频部署模型

若要在 Microsoft Foundry 门户中部署 gpt-realtime 模型，请执行以下作：

转到 Foundry 门户，创建或选择项目。
选择模型部署：
1. 对于 Azure OpenAI 资源，请在左窗格中选择“共享资源”部分中的“部署”。
2. 对于 Foundry 资源，请从左窗格中的“我的资产”下选择“模型 + 终结点”。
选择 “+ 部署模型>部署基本模型 ”以打开部署窗口。
搜索并选择 gpt-realtime 模型，然后选择“确认”。
查看部署详细信息，然后选择“ 部署”。
按照向导完成部署模型的步骤。

有了 gpt-realtime 模型的部署后，可以在 Foundry 门户的音频测试环境或实时 API 中与之交互。

设置

创建新文件夹 realtime-audio-quickstart-py，并使用以下命令转到快速入门文件夹：
```
mkdir realtime-audio-quickstart-py && cd realtime-audio-quickstart-py
```
创建虚拟环境。如果已安装 Python 3.10 或更高版本，则可以使用以下命令创建虚拟环境：
```
py -3 -m venv .venv
.venv\scripts\activate
```
```
python3 -m venv .venv
source .venv/bin/activate
```
```
python3 -m venv .venv
source .venv/bin/activate
```
激活 Python 环境意味着当通过命令行运行 python 或 pip 时，你将使用应用程序的 .venv 文件夹中包含的 Python 解释器。可以使用 deactivate 命令退出 python 虚拟环境，并在需要时重新激活它。

小窍门

建议你创建并激活一个新的 Python 环境，用于安装本教程所需的包。请勿将包安装到你的全局 Python 安装中。安装 Python 包时，请务必使用虚拟或 Conda 环境，否则可能会中断 Python 的全局安装。
使用以下项安装 OpenAI Python 客户端库：
```
pip install openai[realtime]
```
注释

此库由 OpenAI 维护。请参阅发布历史记录，跟踪库的最新更新。
若要使用 Microsoft Entra ID 进行推荐的无密钥身份验证，请使用以下命令安装包：azure-identity
```
pip install azure-identity
```

检索资源信息

需要检索以下信息才能使用 Azure OpenAI 资源对应用程序进行身份验证：

Microsoft Entra ID
API 密钥

变量名称	价值
`AZURE_OPENAI_ENDPOINT`	从 Azure 门户检查资源时，可在“密钥和终结点”部分中找到此值。
`AZURE_OPENAI_DEPLOYMENT_NAME`	此值将对应于在部署模型时为部署选择的自定义名称。 Azure 门户中的“资源管理”“模型部署”下提供了此值。>

详细了解无密钥身份验证，以及如何设置环境变量。

变量名称	价值
`AZURE_OPENAI_ENDPOINT`	从 Azure 门户检查资源时，可在“密钥和终结点”部分中找到此值。
`AZURE_OPENAI_API_KEY`	从 Azure 门户检查资源时，可在“密钥和终结点”部分中找到此值。可以使用 `KEY1` 或 `KEY2`。
`AZURE_OPENAI_DEPLOYMENT_NAME`	此值将对应于在部署模型时为部署选择的自定义名称。 Azure 门户中的“资源管理”“模型部署”下提供了此值。>

详细了解如何查找 API 密钥，以及如何设置环境变量。

重要

有关 AI 服务安全性的详细信息，请参阅对 Azure AI 服务的请求进行身份验证。

谨慎

若要对 SDK 使用推荐的无密钥身份验证，请确保未设置 AZURE_OPENAI_API_KEY 环境变量。

音频输出中的文本

Microsoft Entra ID
API 密钥

使用以下代码创建 text-in-audio-out.py 文件：

import os
import base64
import asyncio
from openai import AsyncOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

async def main() -> None:
    """
    When prompted for user input, type a message and hit enter to send it to the model.
    Enter "q" to quit the conversation.
    """

    credential = DefaultAzureCredential()
    token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
    token = token_provider()

    # The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
    # environment variable.
    # You can find it in the Microsoft Foundry portal in the Overview page of your Azure OpenAI resource.
    # Example: https://{your-resource}.openai.azure.com
    endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]

    # The deployment name of the model you want to use is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
    # environment variable.
    # You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
    # Example: gpt-realtime
    deployment_name = os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"]

    base_url = endpoint.replace("https://", "wss://").rstrip("/") + "/openai/v1"

    # The APIs are compatible with the OpenAI client library.
    # You can use the OpenAI client library to access the Azure OpenAI APIs.
    # Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
    client = AsyncOpenAI(
        websocket_base_url=base_url,
        api_key=token
    )
    async with client.realtime.connect(
        model=deployment_name,
    ) as connection:
        # after the connection is created, configure the session.
        await connection.session.update(session={
            "type": "realtime",
            "instructions": "You are a helpful assistant. You respond by voice and text.",
            "output_modalities": ["audio"],
            "audio": {
                "input": {
                    "transcription": {
                        "model": "whisper-1",
                    },
                    "format": {
                        "type": "audio/pcm",
                        "rate": 24000,
                    },
                    "turn_detection": {
                        "type": "server_vad",
                        "threshold": 0.5,
                        "prefix_padding_ms": 300,
                        "silence_duration_ms": 200,
                        "create_response": True,
                    }
                },
                "output": {
                    "voice": "alloy",
                    "format": {
                        "type": "audio/pcm",
                        "rate": 24000,
                    }
                }
            }
        })

        # After the session is configured, data can be sent to the session.
        while True:
            user_input = input("Enter a message: ")
            if user_input == "q":
                print("Stopping the conversation.")
                break

            await connection.conversation.item.create(
                item={
                    "type": "message",
                    "role": "user",
                    "content": [{"type": "input_text", "text": user_input}],
                }
            )
            await connection.response.create()
            async for event in connection:
                if event.type == "response.output_text.delta":
                    print(event.delta, flush=True, end="")
                elif event.type == "session.created":
                    print(f"Session ID: {event.session.id}")
                elif event.type == "response.output_audio.delta":
                    audio_data = base64.b64decode(event.delta)
                    print(f"Received {len(audio_data)} bytes of audio data.")
                elif event.type == "response.output_audio_transcript.delta":
                    print(f"Received text delta: {event.delta}")
                elif event.type == "response.output_text.done":
                    print()
                elif event.type == "error":
                    print("Received an error event.")
                    print(f"Error code: {event.error.code}")
                    print(f"Error Event ID: {event.error.event_id}")
                    print(f"Error message: {event.error.message}")
                elif event.type == "response.done":
                    break

    print("Conversation ended.")
    credential.close()

asyncio.run(main())

使用以下命令登录到 Azure：
```
az login
```
运行该 Python 文件。
```
python text-in-audio-out.py
```
出现用户输入提示时，键入消息并按 Enter 将其发送到模型。输入“q”退出对话。

使用以下代码创建 text-in-audio-out.py 文件：

import os
import base64
import asyncio
from openai import AsyncOpenAI

async def main() -> None:
    """
    When prompted for user input, type a message and hit enter to send it to the model.
    Enter "q" to quit the conversation.
    """

    # The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
    # environment variable.
    # You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
    # Example: https://{your-resource}.openai.azure.com
    endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
    base_url = endpoint.replace("https://", "wss://").rstrip("/") + "/openai/v1"

    # The deployment name of the model you want to use is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
    # environment variable.
    # You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
    # Example: gpt-realtime
    deployment_name = os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"]

    # API Key of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_API_KEY
    # environment variable.
    # You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
    token=os.environ["AZURE_OPENAI_API_KEY"]

    # The APIs are compatible with the OpenAI client library.
    # You can use the OpenAI client library to access the Azure OpenAI APIs.
    # Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
    client = AsyncOpenAI(
        websocket_base_url=base_url,
        api_key=token
    )
    async with client.realtime.connect(
        model=deployment_name,
    ) as connection:
        # after the connection is created, configure the session.
        await connection.session.update(session={
            "type": "realtime",
            "instructions": "You are a helpful assistant. You respond by voice and text.",
            "output_modalities": ["audio"],
            "audio": {
                "input": {
                    "transcription": {
                        "model": "whisper-1",
                    },
                    "format": {
                        "type": "audio/pcm",
                        "rate": 24000,
                    },
                    "turn_detection": {
                        "type": "server_vad",
                        "threshold": 0.5,
                        "prefix_padding_ms": 300,
                        "silence_duration_ms": 200,
                        "create_response": True,
                    }
                },
                "output": {
                    "voice": "alloy",
                    "format": {
                        "type": "audio/pcm",
                        "rate": 24000,
                    }
                }
            }
        })

        # After the session is configured, data can be sent to the session.
        while True:
            user_input = input("Enter a message: ")
            if user_input == "q":
                print("Stopping the conversation.")
                break

            await connection.conversation.item.create(
                item={
                    "type": "message",
                    "role": "user",
                    "content": [{"type": "input_text", "text": user_input}],
                }
            )
            await connection.response.create()
            async for event in connection:
                if event.type == "response.output_text.delta":
                    print(event.delta, flush=True, end="")
                elif event.type == "session.created":
                    print(f"Session ID: {event.session.id}")
                elif event.type == "response.output_audio.delta":
                    audio_data = base64.b64decode(event.delta)
                    print(f"Received {len(audio_data)} bytes of audio data.")
                elif event.type == "response.output_audio_transcript.delta":
                    print(f"Received text delta: {event.delta}")
                elif event.type == "response.output_text.done":
                    print()
                elif event.type == "error":
                    print("Received an error event.")
                    print(f"Error code: {event.error.code}")
                    print(f"Error Event ID: {event.error.event_id}")
                    print(f"Error message: {event.error.message}")
                elif event.type == "response.done":
                    break

    print("Conversation ended.")

asyncio.run(main())

运行该 Python 文件。
```
python text-in-audio-out.py
```
出现用户输入提示时，键入消息并按 Enter 将其发送到模型。输入“q”退出对话。

片刻之后即可获得响应。

输出

该脚本将从模型获取响应，并打印收到的脚本和音频数据。

输出与下面类似：

Enter a message: How are you today?
Session ID: sess_CgAuonaqdlSNNDTdqBagI
Received text delta: I'm
Received text delta:  doing
Received text delta:  well
Received text delta: ,
Received 4800 bytes of audio data.
Received 7200 bytes of audio data.
Received 12000 bytes of audio data.
Received text delta:  thank
Received text delta:  you
Received text delta:  for
Received text delta:  asking
Received text delta: !
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received text delta:  How
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received text delta:  about
Received text delta:  you
Received text delta: —
Received text delta: how
Received text delta:  are
Received text delta:  you
Received text delta:  feeling
Received text delta:  today
Received text delta: ?
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 24000 bytes of audio data.
Enter a message: q
Stopping the conversation.
Conversation ended.

先决条件

Azure 订阅 - 免费创建订阅
Node.js LTS 或 ESM 支持。
全局安装的 TypeScript。
在受支持的区域之一中创建的 Azure OpenAI 资源。有关区域可用性的详细信息，请参阅模型和版本文档。
然后，你需要使用 Azure OpenAI 资源部署 gpt-realtime 模型。有关详细信息，请参阅使用 Azure OpenAI 创建资源和部署模型。

Microsoft Entra ID 先决条件

若要使用 Microsoft Entra ID 进行推荐的无密钥身份验证，你需要：

安装使用 Microsoft Entra ID 进行无密钥身份验证所需的 Azure CLI。
将Cognitive Services OpenAI User角色分配给用户帐户。你可以在 Azure 门户的“访问控制(IAM)”“添加角色分配”下分配角色。>

为实时音频部署模型

若要在 Microsoft Foundry 门户中部署 gpt-realtime 模型，请执行以下作：

转到 Foundry 门户，创建或选择项目。
选择模型部署：
1. 对于 Azure OpenAI 资源，请在左窗格中选择“共享资源”部分中的“部署”。
2. 对于 Foundry 资源，请从左窗格中的“我的资产”下选择“模型 + 终结点”。
选择 “+ 部署模型>部署基本模型 ”以打开部署窗口。
搜索并选择 gpt-realtime 模型，然后选择“确认”。
查看部署详细信息，然后选择“ 部署”。
按照向导完成部署模型的步骤。

有了 gpt-realtime 模型的部署后，可以在 Foundry 门户的音频测试环境或实时 API 中与之交互。

设置

创建新文件夹 realtime-audio-quickstart-ts，并使用以下命令转到快速入门文件夹：
```
mkdir realtime-audio-quickstart-ts && cd realtime-audio-quickstart-ts
```
使用以下命令创建 package.json：
```
npm init -y
```
使用以下命令将 package.json 更新为 ECMAScript：
```
npm pkg set type=module
```
使用以下命令安装适用于 JavaScript 的 OpenAI 客户端库：
```
npm install openai
```
使用以下命令安装 OpenAI 客户端库用于 JavaScript 的依赖包：
```
npm install ws
```
若要使用 Microsoft Entra ID 进行推荐的无密钥身份验证，请使用以下命令安装包：@azure/identity
```
npm install @azure/identity
```

检索资源信息

需要检索以下信息才能使用 Azure OpenAI 资源对应用程序进行身份验证：

Microsoft Entra ID
API 密钥

变量名称	价值
`AZURE_OPENAI_ENDPOINT`	从 Azure 门户检查资源时，可在“密钥和终结点”部分中找到此值。
`AZURE_OPENAI_DEPLOYMENT_NAME`	此值将对应于在部署模型时为部署选择的自定义名称。 Azure 门户中的“资源管理”“模型部署”下提供了此值。>

详细了解无密钥身份验证，以及如何设置环境变量。

变量名称	价值
`AZURE_OPENAI_ENDPOINT`	从 Azure 门户检查资源时，可在“密钥和终结点”部分中找到此值。
`AZURE_OPENAI_API_KEY`	从 Azure 门户检查资源时，可在“密钥和终结点”部分中找到此值。可以使用 `KEY1` 或 `KEY2`。
`AZURE_OPENAI_DEPLOYMENT_NAME`	此值将对应于在部署模型时为部署选择的自定义名称。 Azure 门户中的“资源管理”“模型部署”下提供了此值。>

详细了解如何查找 API 密钥，以及如何设置环境变量。

重要

有关 AI 服务安全性的详细信息，请参阅对 Azure AI 服务的请求进行身份验证。

谨慎

若要对 SDK 使用推荐的无密钥身份验证，请确保未设置 AZURE_OPENAI_API_KEY 环境变量。

音频输出中的文本

Microsoft Entra ID
API 密钥

使用以下代码创建 index.ts 文件：

import OpenAI from 'openai';
import { OpenAIRealtimeWS } from 'openai/realtime/ws';
import { OpenAIRealtimeError } from 'openai/realtime/internal-base';
import { DefaultAzureCredential, getBearerTokenProvider } from "@azure/identity";
import { RealtimeSessionCreateRequest } from 'openai/resources/realtime/realtime';

let isCreated = false;
let isConfigured = false;
let responseDone = false;

// Set this to false, if you want to continue receiving events after an error is received.
const throwOnError = true;

async function main(): Promise<void> {
    // The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
    // environment variable or replace the default value below.
    // You can find it in the Microsoft Foundry portal in the Overview page of your Azure OpenAI resource.
    // Example: https://{your-resource}.openai.azure.com
    const endpoint = process.env.AZURE_OPENAI_ENDPOINT || 'AZURE_OPENAI_ENDPOINT';
    const baseUrl = endpoint.replace(/\/$/, "") + '/openai/v1';

    // The deployment name of your Azure OpenAI model is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
    // Example: gpt-realtime
    const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt-realtime';

    // Keyless authentication
    const credential = new DefaultAzureCredential();
    const scope = "https://cognitiveservices.azure.com/.default";
    const azureADTokenProvider = getBearerTokenProvider(credential, scope);
    const token = await azureADTokenProvider();

    // The APIs are compatible with the OpenAI client library.
    // You can use the OpenAI client library to access the Azure OpenAI APIs.
    // Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
    const openAIClient = new OpenAI({
        baseURL: baseUrl,
        apiKey: token,
    });
    const realtimeClient = await OpenAIRealtimeWS.create(openAIClient, { model: deploymentName });

    realtimeClient.on('error', (receivedError) => receiveError(receivedError));
    realtimeClient.on('session.created', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('session.updated', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio_transcript.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.done', (receivedEvent) => receiveEvent(receivedEvent));

    console.log('Waiting for events...');
    while (!isCreated) {
        console.log('Waiting for session.created event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is created, configure it to enable audio input and output.
    const sessionConfig: RealtimeSessionCreateRequest = {
        'type': 'realtime',
        'instructions': 'You are a helpful assistant. You respond by voice and text.',
        'output_modalities': ['audio'],
        'audio': {
            'input': {
                'transcription': {
                    'model': 'whisper-1'
                },
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                },
                'turn_detection': {
                    'type': 'server_vad',
                    'threshold': 0.5,
                    'prefix_padding_ms': 300,
                    'silence_duration_ms': 200,
                    'create_response': true
                }
            },
            'output': {
                'voice': 'alloy',
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                }
            }
        }
    };

    realtimeClient.send({ 'type': 'session.update', 'session': sessionConfig });

    while (!isConfigured) {
        console.log('Waiting for session.updated event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is configured, data can be sent to the session.
    realtimeClient.send({
        'type': 'conversation.item.create',
        'item': {
            'type': 'message',
            'role': 'user',
            'content': [{ type: 'input_text', text: 'Please assist the user.' }]
        }
    });

    realtimeClient.send({ type: 'response.create' });

    // While waiting for the session to finish, the events can be handled in the event handlers.
    // In this example, we just wait for the first response.done event. 
    while (!responseDone) {
        console.log('Waiting for response.done event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    console.log('The sample completed successfully.');
    realtimeClient.close();
}

function receiveError(errorEvent: OpenAIRealtimeError): void {
    if (errorEvent instanceof OpenAIRealtimeError) {
        console.error('Received an error event.');
        console.error(`Message: ${errorEvent.message}`);
        console.error(`Stack: ${errorEvent.stack}`); errorEvent
    }

    if (throwOnError) {
        throw errorEvent;
    }
}

function receiveEvent(event: any): void {
    console.log(`Received an event: ${event.type}`);

    switch (event.type) {
        case 'session.created':
            console.log(`Session ID: ${event.session.id}`);
            isCreated = true;
            break;
        case 'session.updated':
            console.log(`Session ID: ${event.session.id}`);
            isConfigured = true;
            break;
        case 'response.output_audio_transcript.delta':
            console.log(`Transcript delta: ${event.delta}`);
            break;
        case 'response.output_audio.delta':
            let audioBuffer = Buffer.from(event.delta, 'base64');
            console.log(`Audio delta length: ${audioBuffer.length} bytes`);
            break;
        case 'response.done':
            console.log(`Response ID: ${event.response.id}`);
            console.log(`The final response is: ${event.response.output[0].content[0].transcript}`);
            responseDone = true;
            break;
        default:
            console.warn(`Unhandled event type: ${event.type}`);
    }
}

main().catch((err) => {
    console.error("The sample encountered an error:", err);
});

export { main };

创建 tsconfig.json 文件以转译 TypeScript 代码，然后复制以下 ECMAScript 代码。

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

安装 Node 的类型定义
```
npm i --save-dev @types/node
```
从 TypeScript 转译到 JavaScript。
```
tsc
```
使用以下命令登录到 Azure：
```
az login
```
使用以下命令运行代码：
```
node index.js
```

使用以下代码创建 index.ts 文件：

import OpenAI from 'openai';
import { OpenAIRealtimeWS } from 'openai/realtime/ws';
import { OpenAIRealtimeError } from 'openai/realtime/internal-base';
import { RealtimeSessionCreateRequest } from 'openai/resources/realtime/realtime';

let isCreated = false;
let isConfigured = false;
let responseDone = false;

// Set this to false, if you want to continue receiving events after an error is received.
const throwOnError = true;

async function main(): Promise<void> {
    // The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
    // Example: https://{your-resource}.openai.azure.com
    const endpoint = process.env.AZURE_OPENAI_ENDPOINT || 'AZURE_OPENAI_ENDPOINT';
    const baseUrl = endpoint.replace(/\/$/, "") + '/openai/v1';

    // The deployment name of your Azure OpenAI model is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
    // Example: gpt-realtime
    const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt-realtime';

    // API Key of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_API_KEY
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
    const token = process.env.AZURE_OPENAI_API_KEY || '<Your API Key>';

    // The APIs are compatible with the OpenAI client library.
    // You can use the OpenAI client library to access the Azure OpenAI APIs.
    // Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
    const openAIClient = new OpenAI({
        baseURL: baseUrl,
        apiKey: token,
    });

    // Due to the current SDK limitation we need to explicitly
    // pass API key as Header
    const realtimeClient = await OpenAIRealtimeWS.create(
        openAIClient, {
        model: deploymentName,
        options: {
            headers: {
                "api-key": token
            }
        }
    });

    realtimeClient.on('error', (receivedError) => receiveError(receivedError));
    realtimeClient.on('session.created', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('session.updated', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio_transcript.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.done', (receivedEvent) => receiveEvent(receivedEvent));

    console.log('Waiting for events...');
    while (!isCreated) {
        console.log('Waiting for session.created event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is created, configure it to enable audio input and output.
    const sessionConfig: RealtimeSessionCreateRequest = {
        'type': 'realtime',
        'instructions': 'You are a helpful assistant. You respond by voice and text.',
        'output_modalities': ['audio'],
        'audio': {
            'input': {
                'transcription': {
                    'model': 'whisper-1'
                },
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                },
                'turn_detection': {
                    'type': 'server_vad',
                    'threshold': 0.5,
                    'prefix_padding_ms': 300,
                    'silence_duration_ms': 200,
                    'create_response': true
                }
            },
            'output': {
                'voice': 'alloy',
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                }
            }
        }
    };

    realtimeClient.send({
        'type': 'session.update',
        'session': sessionConfig
    });
    while (!isConfigured) {
        console.log('Waiting for session.updated event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is configured, data can be sent to the session.    
    realtimeClient.send({
        'type': 'conversation.item.create',
        'item': {
            'type': 'message',
            'role': 'user',
            'content': [{
                type: 'input_text',
                text: 'Please assist the user.'
            }
            ]
        }
    });

    realtimeClient.send({
        type: 'response.create'
    });

    // While waiting for the session to finish, the events can be handled in the event handlers.
    // In this example, we just wait for the first response.done event.
    while (!responseDone) {
        console.log('Waiting for response.done event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    console.log('The sample completed successfully.');
    realtimeClient.close();
}

function receiveError(errorEvent: OpenAIRealtimeError): void {
    if (errorEvent instanceof OpenAIRealtimeError) {
        console.error('Received an error event.');
        console.error(`Message: ${errorEvent.message}`);
        console.error(`Stack: ${errorEvent.stack}`);
        errorEvent
    }

    if (throwOnError) {
        throw errorEvent;
    }
}

function receiveEvent(event: any): void {
    console.log(`Received an event: ${event.type}`);

    switch (event.type) {
        case 'session.created':
            console.log(`Session ID: ${event.session.id}`);
            isCreated = true;
            break;
        case 'session.updated':
            console.log(`Session ID: ${event.session.id}`);
            isConfigured = true;
            break;
        case 'response.output_audio_transcript.delta':
            console.log(`Transcript delta: ${event.delta}`);
            break;
        case 'response.output_audio.delta':
            let audioBuffer = Buffer.from(event.delta, 'base64');
            console.log(`Audio delta length: ${audioBuffer.length} bytes`);
            break;
        case 'response.done':
            console.log(`Response ID: ${event.response.id}`);
            console.log(`The final response is: ${event.response.output[0].content[0].transcript}`);
            responseDone = true;
            break;
        default:
            console.warn(`Unhandled event type: ${event.type}`);
    }
}

main().catch((err) => {
    console.error("The sample encountered an error:", err);
});

export {
    main
};

创建 tsconfig.json 文件以转译 TypeScript 代码，然后复制以下 ECMAScript 代码。

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

安装 Node 的类型定义
```
npm i --save-dev @types/node
```
从 TypeScript 转译到 JavaScript。
```
tsc
```
使用以下命令运行代码：
```
node index.js
```

片刻之后即可获得响应。

输出

该脚本将从模型获取响应，并打印收到的脚本和音频数据。

输出将类似于以下内容：

Waiting for events...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Received an event: session.created
Session ID: sess_CWQkREiv3jlU3gk48bm0a
Waiting for session.updated event...
Waiting for session.updated event...
Received an event: session.updated
Session ID: sess_CWQkREiv3jlU3gk48bm0a
Waiting for response.done event...
Waiting for response.done event...
Waiting for response.done event...
Waiting for response.done event...
Waiting for response.done event...
Received an event: response.output_audio_transcript.delta
Transcript delta: Sure
Received an event: response.output_audio_transcript.delta
Transcript delta: ,
Received an event: response.output_audio_transcript.delta
Transcript delta:  I'm
Received an event: response.output_audio_transcript.delta
Transcript delta:  here
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 4800 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 7200 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta:  to
Received an event: response.output_audio_transcript.delta
Transcript delta:  help
Received an event: response.output_audio_transcript.delta
Transcript delta: .
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta:  What
Received an event: response.output_audio_transcript.delta
Transcript delta:  would
Received an event: response.output_audio_transcript.delta
Transcript delta:  you
Received an event: response.output_audio_transcript.delta
Transcript delta:  like
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta:  to
Received an event: response.output_audio_transcript.delta
Transcript delta:  do
Received an event: response.output_audio_transcript.delta
Transcript delta:  or
Received an event: response.output_audio_transcript.delta
Transcript delta:  know
Received an event: response.output_audio_transcript.delta
Transcript delta:  about
Received an event: response.output_audio_transcript.delta
Transcript delta: ?
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 24000 bytes
Received an event: response.done
Response ID: resp_CWQkRBrCcCjtHgIEapA92
The final response is: Sure, I'm here to help. What would you like to do or know about?
The sample completed successfully.

为实时音频部署模型

若要在 Microsoft Foundry 门户中部署 gpt-realtime 模型，请执行以下作：

转到 Foundry 门户，创建或选择项目。
选择模型部署：
1. 对于 Azure OpenAI 资源，请在左窗格中选择“共享资源”部分中的“部署”。
2. 对于 Foundry 资源，请从左窗格中的“我的资产”下选择“模型 + 终结点”。
选择 “+ 部署模型>部署基本模型 ”以打开部署窗口。
搜索并选择 gpt-realtime 模型，然后选择“确认”。
查看部署详细信息，然后选择“ 部署”。
按照向导完成部署模型的步骤。

有了 gpt-realtime 模型的部署后，可以在 Foundry 门户的音频测试环境或实时 API 中与之交互。

使用 GPT 实时音频

gpt-realtime要在 Microsoft Foundry 实时音频操场中与所部署的模型聊天，请执行以下步骤：

转到 Foundry 门户，选择已部署 gpt-realtime 模型的项目。
从左窗格中选择“游乐场地”。
选择音频游乐场>试用音频游乐场。

注释

聊天实验室不支持gpt-realtime模型。使用本节中所述的音频操场。
从gpt-realtime下拉列表中选择已部署的模型。
（可选）你可以在“为模型提供指令和上下文”文本框中编辑内容。为模型提供有关它应该如何运行以及在生成回复时应引用的任何上下文的说明。你可以描述助手的个性，告诉它应该回答什么和不应该回答什么，并告诉它如何设置回复的格式。
（可选）更改阈值、前缀填充和静音持续时间等设置。
选择“开始收听”以启动会话。可以对着麦克风说话来开始聊天。
可以随时通过说话来中断聊天。可以通过选择“停止收听”按钮来结束聊天。

详细了解如何使用实时 API
请参阅实时 API 参考
详细了解 Azure OpenAI 配额和限制
详细了解语音服务的语言和语音支持

反馈

此页面是否有帮助？

Last updated on 2025-11-18

通过

适用于语音和音频的 GPT 实时 API

支持的模型

API 支持

先决条件

Microsoft Entra ID 先决条件

为实时音频部署模型

设置

检索资源信息

音频输出中的文本

输出

先决条件

Microsoft Entra ID 先决条件

为实时音频部署模型

设置

检索资源信息

音频输出中的文本

输出

先决条件

Microsoft Entra ID 先决条件

为实时音频部署模型

设置

检索资源信息

音频输出中的文本

输出

为实时音频部署模型

使用 GPT 实时音频

相关内容

反馈

其他资源