你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn。
适用于语音和音频的 Azure OpenAI GPT 实时 API 是 GPT-4o 模型系列的一部分,该系列支持低延迟的“语音传入,语音传出”对话交互。
可以通过 WebRTC、SIP 或 WebSocket 使用实时 API 将音频输入发送到模型并实时接收音频响应。 按照本文中的说明通过 WebRTC 开始使用实时 API。
在大多数情况下,使用 WebRTC API 进行实时音频流式处理。 WebRTC API 是一种 Web 标准,可在浏览器和移动应用程序之间实现实时通信(RTC)。 下面是 WebRTC 首选实时音频流式传输的一些原因:
- 较低的延迟:WebRTC 旨在最大程度地减少延迟,使其更适用于音频和视频通信,其中低延迟对于保持质量和同步至关重要。
- 媒体处理:WebRTC 内置了对音频和视频编解码器的支持,提供对媒体流的优化处理。
- 错误更正:WebRTC 包括处理数据包丢失和抖动的机制,这对于通过不可预知的网络保持音频流的质量至关重要。
- 对等通信:WebRTC 允许客户端之间直接通信,减少中央服务器中继音频数据的需求,从而进一步降低延迟。
如果需要 :通过 WebSocket 使用实时 API :
- 将音频数据从服务器流式传输到客户端。
- 在客户端和服务器之间实时发送和接收数据。
不建议使用 WebSocket 进行实时音频流式处理,因为它们的延迟高于 WebRTC。
支持的模型
可以访问 GPT 实时模型,以便在 美国东部 2 和瑞典中部区域进行全球部署。
-
gpt-4o-mini-realtime-preview(2024-12-17) -
gpt-4o-realtime-preview(2024-12-17) -
gpt-realtime(版本 2025-08-28) -
gpt-realtime-mini(版本 2025-10-06)
应使用实时 API 的 URL 中的 API 版本 2025-08-28 。 API 版本包含在会话 URL 中。
有关支持的模型的详细信息,请参阅 模型和版本文档。
重要
WebRTC 的 GA 协议。
你仍然可以使用 beta 协议,但我们建议从 GA 协议开始。 如果你是当前客户,请计划迁移到 GA 协议。
本文介绍如何将 WebRTC 与 GA 协议配合使用。 我们 在此处保留旧协议文档。
先决条件
在使用 GPT 实时音频之前,需要:
- Azure 订阅 - 免费创建订阅。
- Microsoft Foundry 资源 - 在一个受支持的区域中创建 Microsoft Foundry 资源。
- 如本文中的“支持的模型”部分所述,在受支持的区域中部署
gpt-4o-realtime-preview、gpt-4o-mini-realtime-preview、gpt-realtime或gpt-realtime-mini模型。- 在 Microsoft Foundry 门户中,加载项目。 在右上方菜单中选择“ 生成 ”,然后选择左窗格中的“ 模型 ”选项卡,然后 部署基本模型。 搜索所需的模型,然后选择“模型”页上的“ 部署 ”。
- Azure 订阅 - 免费创建订阅。
- 在 受支持的区域中创建的 Azure OpenAI 资源。 有关详细信息,请参阅使用 Azure OpenAI 创建资源和部署模型。
- 如本文中的“支持的模型”部分所述,在受支持的区域中部署
gpt-4o-realtime-preview、gpt-4o-mini-realtime-preview、gpt-realtime或gpt-realtime-mini模型。 可以从 Foundry 模型目录 或 Microsoft Foundry 门户中的项目部署模型。
设置 WebRTC
若要使用 WebRTC,需要两段代码。
- 您的网络浏览器应用
- 一种可通过 Web 浏览器检索临时令牌的服务
更多选项:
你可通过用于检索临时令牌的同一服务,使用会话描述协议代理 Web 浏览器的会话协商。 此方案具有更好的安全性,因为 Web 浏览器无权访问临时令牌。
可以使用查询参数筛选转到 Web 浏览器的消息。
可以创建观察者 Websocket 连接来侦听或记录会话。
Steps
步骤 1:设置服务以获取临时令牌
生成临时令牌的关键在于使用 REST API
url = https://{your azure resource}.openai.azure.com/openai/v1/realtime/client_secrets
将此 URL 与 api 密钥或Microsoft Entra ID 令牌一起使用。 此请求将检索临时令牌,并设置你希望 Web 浏览器将使用的会话配置,包括提示指令和输出语音。
下面是令牌服务的一些示例 Python 代码。 Web 浏览器应用程序可以使用 /token 终结点来调用此服务,以检索临时令牌。 此示例代码使用 DefaultAzureCredential 对生成临时令牌的 RealtimeAPI 进行身份验证。
from flask import Flask, jsonify
import os
import requests
import time
import threading
from azure.identity import DefaultAzureCredential
app = Flask(__name__)
# Session configuration
session_config = {
"session": {
"type": "realtime",
"model": "<your model deployment name>",
"instructions": "You are a helpful assistant.",
"audio": {
"output": {
"voice": "marin",
},
},
},
}
# Get configuration from environment variables
azure_resource = os.getenv('AZURE_RESOURCE') # e.g., 'your-azure-resource'
# Token caching variables
cached_token = None
token_expiry = 0
token_lock = threading.Lock()
def get_bearer_token(resource_scope: str) -> str:
"""Get a bearer token using DefaultAzureCredential with caching."""
global cached_token, token_expiry
current_time = time.time()
# Check if we have a valid cached token (with 5 minute buffer before expiry)
with token_lock:
if cached_token and current_time < (token_expiry - 300):
return cached_token
# Get a new token
try:
credential = DefaultAzureCredential()
token = credential.get_token(resource_scope)
with token_lock:
cached_token = token.token
token_expiry = token.expires_on
print(f"Acquired new bearer token, expires at: {time.ctime(token_expiry)}")
return cached_token
except Exception as e:
print(f"Failed to acquire bearer token: {e}")
raise
@app.route('/token', methods=['GET'])
def get_token():
"""
An endpoint which returns the contents of a REST API request to the protected endpoint.
Uses DefaultAzureCredential for authentication with token caching.
"""
try:
# Get bearer token using DefaultAzureCredential
bearer_token = get_bearer_token("https://cognitiveservices.azure.com/.default")
# Construct the Azure OpenAI endpoint URL
url = f"https://{azure_resource}.openai.azure.com/openai/v1/realtime/client_secrets"
headers = {
"Authorization": f"Bearer {bearer_token}",
"Content-Type": "application/json",
}
# Make the request to Azure OpenAI
response = requests.post(
url,
headers=headers,
json=session_config,
timeout=30
)
# Check if the request was successful
if response.status_code != 200:
print(f"Request failed with status {response.status_code}: {response.reason}")
print(f"Response headers: {dict(response.headers)}")
print(f"Response content: {response.text}")
response.raise_for_status()
# Parse the JSON response and extract the ephemeral token
data = response.json()
ephemeral_token = data.get('value', '')
if not ephemeral_token:
print(f"No ephemeral token found in response: {data}")
return jsonify({"error": "No ephemeral token available"}), 500
# Return the ephemeral token as JSON
return jsonify({"token": ephemeral_token})
except requests.exceptions.RequestException as e:
print(f"Token generation error: {e}")
if hasattr(e, 'response') and e.response is not None:
print(f"Response status: {e.response.status_code}")
print(f"Response reason: {e.response.reason}")
print(f"Response content: {e.response.text}")
return jsonify({"error": "Failed to generate token"}), 500
except Exception as e:
print(f"Unexpected error: {e}")
return jsonify({"error": "Failed to generate token"}), 500
if __name__ == '__main__':
if not azure_resource:
print("Error: AZURE_RESOURCE environment variable is required")
exit(1)
print(f"Starting token service for Azure resource: {azure_resource}")
print("Using DefaultAzureCredential for authentication")
print("Production mode - use gunicorn to run this service:")
port = int(os.getenv('PORT', 5000))
print(f" gunicorn -w 4 -b 0.0.0.0:{port} --timeout 30 token-service:app")
步骤 2:设置浏览器应用程序
浏览器应用程序调用令牌服务以获取令牌,然后启动与 RealtimeAPI 的 WebRTC 连接。 若要启动 WebRTC 连接,请使用以下 URL 和临时令牌进行身份验证。
https://<your azure resource>.openai.azure.com/openai/v1/realtime/calls
连接后,浏览器应用程序通过数据通道发送文本,通过媒体通道发送音频。 下面是一个示例 HTML 文档,可帮助你入门。
html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Azure OpenAI Realtime Session</title>
</head>
<body>
<h1>Azure OpenAI Realtime Session</h1>
<button onclick="StartSession()">Start Session</button>
<!-- Log container for API messages -->
<div id="logContainer"></div>
<script>
const AZURE_RESOURCE = "<your azure resource>"
const WEBRTC_URL= `https://${AZURE_RESOURCE}.openai.azure.com/openai/v1/realtime/calls?webrtcfilter=on`
async function StartSession() {
try {
// Call our token service to get the ephemeral key
const tokenResponse = await fetch("/token");
if (!tokenResponse.ok) {
throw new Error(`Token service request failed: ${tokenResponse.status}`);
}
const tokenData = await tokenResponse.json();
const ephemeralKey = tokenData.token;
console.log("Ephemeral key received from token service");
// Mask the ephemeral key in the log message.
logMessage("Ephemeral Key Received from Token Service: " + "***");
// Set up the WebRTC connection using the ephemeral key.
init(ephemeralKey);
} catch (error) {
console.error("Error fetching ephemeral key:", error);
logMessage("Error fetching ephemeral key: " + error.message);
}
}
async function init(ephemeralKey) {
logMessage("🚀 Starting WebRTC initialization...");
let peerConnection = new RTCPeerConnection();
logMessage("✅ RTCPeerConnection created");
// Set up to play remote audio from the model.
const audioElement = document.createElement('audio');
audioElement.autoplay = true;
document.body.appendChild(audioElement);
logMessage("🔊 Audio element created and added to page");
peerConnection.ontrack = (event) => {
logMessage("🎵 Remote track received! Type: " + event.track.kind);
logMessage("📊 Number of streams: " + event.streams.length);
if (event.streams.length > 0) {
audioElement.srcObject = event.streams[0];
logMessage("✅ Audio stream assigned to audio element");
// Add event listeners to audio element for debugging
audioElement.onloadstart = () => logMessage("🔄 Audio loading started");
audioElement.oncanplay = () => logMessage("▶️ Audio can start playing");
audioElement.onplay = () => logMessage("🎵 Audio playback started");
audioElement.onerror = (e) => logMessage("❌ Audio error: " + e.message);
} else {
logMessage("⚠️ No streams in track event");
}
};
// Set up data channel for sending and receiving events
logMessage("🎤 Requesting microphone access...");
try {
const clientMedia = await navigator.mediaDevices.getUserMedia({ audio: true });
logMessage("✅ Microphone access granted");
const audioTrack = clientMedia.getAudioTracks()[0];
logMessage("🎤 Audio track obtained: " + audioTrack.label);
peerConnection.addTrack(audioTrack);
logMessage("✅ Audio track added to peer connection");
} catch (error) {
logMessage("❌ Failed to get microphone access: " + error.message);
return;
}
const dataChannel = peerConnection.createDataChannel('realtime-channel');
logMessage("📡 Data channel created");
dataChannel.addEventListener('open', () => {
logMessage('✅ Data channel is open - ready to send messages');
// Send client events to start the conversation
logMessage("📝 Preparing to send text input message...");
const event = {
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [
{
type: "input_text",
text: "hello there! Can you give me some vacation options?",
},
],
},
};
logMessage("📤 Sending conversation.item.create event...");
logMessage("💬 Text content: " + event.item.content[0].text);
try {
dataChannel.send(JSON.stringify(event));
logMessage("✅ Text input sent successfully!");
// Now send response.create to trigger the AI response
const responseEvent = {
type: "response.create"
};
logMessage("📤 Sending response.create event to trigger AI response...");
dataChannel.send(JSON.stringify(responseEvent));
logMessage("✅ Response.create sent successfully!");
} catch (error) {
logMessage("❌ Failed to send text input: " + error.message);
}
}); dataChannel.addEventListener('message', (event) => {
const realtimeEvent = JSON.parse(event.data);
console.log(realtimeEvent);
logMessage("Received server event: " + JSON.stringify(realtimeEvent, null, 2));
if (realtimeEvent.type === "session.update") {
const instructions = realtimeEvent.session.instructions;
logMessage("Instructions: " + instructions);
} else if (realtimeEvent.type === "session.error") {
logMessage("Error: " + realtimeEvent.error.message);
} else if (realtimeEvent.type === "session.end") {
logMessage("Session ended.");
}
});
dataChannel.addEventListener('close', () => {
logMessage('Data channel is closed');
});
// Start the session using the Session Description Protocol (SDP)
logMessage("🤝 Creating WebRTC offer...");
const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);
logMessage("✅ Local description set");
logMessage("📡 Sending SDP offer to: " + WEBRTC_URL);
const sdpResponse = await fetch(`${WEBRTC_URL}`, {
method: "POST",
body: offer.sdp,
headers: {
Authorization: `Bearer ${ephemeralKey}`,
"Content-Type": "application/sdp",
},
});
logMessage("📥 Received SDP response, status: " + sdpResponse.status);
if (!sdpResponse.ok) {
logMessage("❌ SDP exchange failed: " + sdpResponse.statusText);
return;
}
const answerSdp = await sdpResponse.text();
logMessage("✅ Got SDP answer, length: " + answerSdp.length + " chars");
const answer = { type: "answer", sdp: answerSdp };
await peerConnection.setRemoteDescription(answer);
logMessage("✅ Remote description set - WebRTC connection should be establishing...");
// Add connection state logging
peerConnection.onconnectionstatechange = () => {
logMessage("🔗 Connection state: " + peerConnection.connectionState);
};
peerConnection.oniceconnectionstatechange = () => {
logMessage("🧊 ICE connection state: " + peerConnection.iceConnectionState);
}; const button = document.createElement('button');
button.innerText = 'Close Session';
button.onclick = stopSession;
document.body.appendChild(button);
function stopSession() {
if (dataChannel) dataChannel.close();
if (peerConnection) peerConnection.close();
peerConnection = null;
logMessage("Session closed.");
}
}
function logMessage(message) {
const logContainer = document.getElementById("logContainer");
const p = document.createElement("p");
p.textContent = message;
logContainer.appendChild(p);
}
</script>
</body>
</html>
在示例中,我们使用查询参数 webrtcfilter=on。 此查询参数限制发送到浏览器的数据通道消息,以保持提示说明的私密性。 打开筛选器后,数据通道上仅返回以下消息到浏览器:
- input_audio_buffer.speech_started
- input_audio_buffer.speech_stopped
- 输出音频缓冲区.已启动
- output_audio_buffer.stopped
- conversation.item.input_audio_transcription.completed
- conversation.item.added
- conversation.item.created
- response.output_text.delta
- response.output_text.done
- response.output_audio_transcript.delta
- response.output_audio_transcript.done
步骤 3 (可选):创建 Websocket 观察程序/控制器
如果通过服务应用程序代理会话协商,则可以分析返回的 Location 标头,并使用该标头创建与 WebRTC 调用的 Websocket 连接。 此连接可以通过直接发出 session.update 事件和其他命令来记录 WebRTC 调用,甚至可以对其进行控制。
下面是前面显示的token_service的更新版本,现在有一个 /connect 终结点,可用于获取临时令牌并协商会话启动。 它还包括侦听 WebRTC 会话的 Websocket 连接。
from flask import Flask, jsonify, request
#from flask_cors import CORS
import os
import requests
import time
import threading
import asyncio
import json
import websockets
from azure.identity import DefaultAzureCredential
app = Flask(__name__)
# CORS(app) # Enable CORS for all routes when running locally for testing
# Session configuration
session_config = {
"session": {
"type": "realtime",
"model": "<YOUR MODEL DEPLOYMENT NAME>",
"instructions": "You are a helpful assistant.",
"audio": {
"output": {
"voice": "marin",
},
},
},
}
# Get configuration from environment variables
azure_resource = os.getenv('AZURE_RESOURCE') # e.g., 'your-azure-resource'
# Token caching variables
cached_token = None
token_expiry = 0
token_lock = threading.Lock()
def get_bearer_token(resource_scope: str) -> str:
"""Get a bearer token using DefaultAzureCredential with caching."""
global cached_token, token_expiry
current_time = time.time()
# Check if we have a valid cached token (with 5 minute buffer before expiry)
with token_lock:
if cached_token and current_time < (token_expiry - 300):
return cached_token
# Get a new token
try:
credential = DefaultAzureCredential()
token = credential.get_token(resource_scope)
with token_lock:
cached_token = token.token
token_expiry = token.expires_on
print(f"Acquired new bearer token, expires at: {time.ctime(token_expiry)}")
return cached_token
except Exception as e:
print(f"Failed to acquire bearer token: {e}")
raise
def get_ephemeral_token():
"""
Generate an ephemeral token from Azure OpenAI.
Returns:
str: The ephemeral token
Raises:
Exception: If token generation fails
"""
# Get bearer token using DefaultAzureCredential
bearer_token = get_bearer_token("https://cognitiveservices.azure.com/.default")
# Construct the Azure OpenAI endpoint URL
url = f"https://{azure_resource}.openai.azure.com/openai/v1/realtime/client_secrets"
headers = {
"Authorization": f"Bearer {bearer_token}",
"Content-Type": "application/json",
}
# Make the request to Azure OpenAI
response = requests.post(
url,
headers=headers,
json=session_config,
timeout=30
)
# Check if the request was successful
if response.status_code != 200:
print(f"Request failed with status {response.status_code}: {response.reason}")
print(f"Response headers: {dict(response.headers)}")
print(f"Response content: {response.text}")
response.raise_for_status()
# Parse the JSON response and extract the ephemeral token
data = response.json()
ephemeral_token = data.get('value', '')
if not ephemeral_token:
print(f"No ephemeral token found in response: {data}")
raise Exception("No ephemeral token available")
return ephemeral_token
def perform_sdp_negotiation(ephemeral_token, sdp_offer):
"""
Perform SDP negotiation with the Azure OpenAI Realtime API.
Args:
ephemeral_token (str): The ephemeral token for authentication
sdp_offer (str): The SDP offer to send
Returns:
tuple: (sdp_answer, location_header) - The SDP answer from the server and Location header for WebSocket
Raises:
Exception: If SDP negotiation fails
"""
# Construct the realtime endpoint URL - matching the v1transceiver_test pattern
realtime_url = f"https://{azure_resource}.openai.azure.com/openai/v1/realtime/calls"
headers = {
'Authorization': f'Bearer {ephemeral_token}',
'Content-Type': 'application/sdp' # Azure OpenAI expects application/sdp, not form data
}
print(f"Sending SDP offer to: {realtime_url}")
# Send the SDP offer as raw body data (not form data)
response = requests.post(realtime_url, data=sdp_offer, headers=headers, timeout=30)
if response.status_code == 201: # Changed from 200 to 201 to match the test expectation
sdp_answer = response.text
location_header = response.headers.get('Location', '')
print(f"Received SDP answer: {sdp_answer[:100]}...")
if location_header:
print(f"Captured Location header: {location_header}")
else:
print("Warning: No Location header found in response")
return sdp_answer, location_header
else:
error_msg = f"SDP negotiation failed: {response.status_code} - {response.text}"
print(error_msg)
raise Exception(error_msg)
@app.route('/token', methods=['GET'])
def get_token():
"""
An endpoint which returns an ephemeral token for Azure OpenAI Realtime API.
Uses DefaultAzureCredential for authentication with token caching.
"""
try:
ephemeral_token = get_ephemeral_token()
return jsonify({
"token": ephemeral_token,
"endpoint": f"https://{azure_resource}.openai.azure.com",
"deployment": "gpt-4o-realtime-preview"
})
except requests.exceptions.RequestException as e:
print(f"Token generation error: {e}")
if hasattr(e, 'response') and e.response is not None:
print(f"Response status: {e.response.status_code}")
print(f"Response reason: {e.response.reason}")
print(f"Response content: {e.response.text}")
return jsonify({"error": "Failed to generate token"}), 500
except Exception as e:
print(f"Unexpected error: {e}")
return jsonify({"error": "Failed to generate token"}), 500
async def connect_websocket(location_header, bearer_token=None, api_key=None):
"""
Connect to the WebSocket endpoint using the Location header.
Similar to the _connect_websocket function in run_v1transceiver_test.py
Args:
location_header (str): The Location header from the SDP negotiation response
bearer_token (str, optional): Bearer token for authentication
api_key (str, optional): API key for authentication (fallback)
Returns:
None: Just logs messages, doesn't store them
"""
# Extract call_id from location header
# Example: /v1/realtime/calls/rtc_abc123 -> rtc_abc123
call_id = location_header.split('/')[-1]
print(f"Extracted call_id: {call_id}")
# Construct WebSocket URL: wss://<resource>.openai.azure.com/openai/v1/realtime?call_id=<call_id>
ws_url = f"wss://{azure_resource}.openai.azure.com/openai/v1/realtime?call_id={call_id}"
print(f"Connecting to WebSocket: {ws_url}")
message_count = 0
try:
# WebSocket headers - use proper authentication
headers = {}
if bearer_token is not None:
print("Using Bearer token for WebSocket authentication")
headers["Authorization"] = f"Bearer {bearer_token}"
elif api_key is not None:
print("Using API key for WebSocket authentication")
headers["api-key"] = api_key
else:
print("Warning: No authentication provided for WebSocket")
async with websockets.connect(ws_url, additional_headers=headers) as websocket:
print("WebSocket connection established")
# Listen for messages
try:
async for message in websocket:
try:
# Parse JSON message
json_data = json.loads(message)
msg_type = json_data.get('type', 'unknown')
message_count += 1
print(f"WebSocket [{message_count}]: {msg_type}")
# Handle specific message types with additional details
if msg_type == 'response.done':
session_status = json_data['response'].get('status', 'unknown')
session_details = json_data['response'].get('details', 'No details provided')
print(f" -> Response status: {session_status}, Details: {session_details}")
# Continue listening instead of breaking
elif msg_type == 'session.created':
session_id = json_data.get('session', {}).get('id', 'unknown')
print(f" -> Session created: {session_id}")
elif msg_type == 'error':
error_message = json_data.get('error', {}).get('message', 'No error message')
print(f" -> Error: {error_message}")
except json.JSONDecodeError:
message_count += 1
print(f"WebSocket [{message_count}]: Non-JSON message: {message[:100]}...")
except Exception as e:
print(f"Error processing WebSocket message: {e}")
except websockets.exceptions.ConnectionClosed:
print(f"WebSocket connection closed by remote (processed {message_count} messages)")
except Exception as e:
print(f"WebSocket message loop error: {e}")
except Exception as e:
print(f"WebSocket connection error: {e}")
print(f"WebSocket monitoring completed. Total messages processed: {message_count}")
def start_websocket_background(location_header, bearer_token):
"""
Start WebSocket connection in background thread to monitor/record the call.
"""
def run_websocket():
try:
print(f"Starting background WebSocket monitoring for: {location_header}")
# Create new event loop for this thread
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
# Run the WebSocket connection (now just logs, doesn't return messages)
loop.run_until_complete(
connect_websocket(location_header, bearer_token)
)
print("Background WebSocket monitoring completed.")
except Exception as e:
print(f"Background WebSocket error: {e}")
finally:
loop.close()
except Exception as e:
print(f"Failed to start background WebSocket: {e}")
# Start the WebSocket in a background thread
websocket_thread = threading.Thread(target=run_websocket, daemon=True)
websocket_thread.start()
print("Background WebSocket thread started")
@app.route('/connect', methods=['POST'])
def connect_and_negotiate():
"""
Get token and perform SDP negotiation.
Expects multipart form data with 'sdp' field containing the SDP offer.
Returns SDP answer as plain text (matching the v1transceiver_test behavior).
Automatically starts WebSocket connection in background to monitor/record the call.
"""
try:
# Get the SDP offer from multipart form data
if 'sdp' not in request.form:
return jsonify({"error": "Missing 'sdp' field in multipart form data"}), 400
sdp_offer = request.form['sdp']
print(f"Received SDP offer: {sdp_offer[:100]}...")
# Get ephemeral token using shared function
ephemeral_token = get_ephemeral_token()
print(f"Got ephemeral token for SDP negotiation: {ephemeral_token[:20]}...")
# Perform SDP negotiation using shared function
sdp_answer, location_header = perform_sdp_negotiation(ephemeral_token, sdp_offer)
# Create response headers
response_headers = {'Content-Type': 'application/sdp'}
# If we have a location header, start WebSocket connection in background to monitor/record the call
if location_header:
try:
# Get a bearer token for WebSocket authentication
bearer_token = get_bearer_token("https://cognitiveservices.azure.com/.default")
start_websocket_background(location_header, bearer_token)
except Exception as e:
print(f"Failed to start background WebSocket monitoring: {e}")
# Don't fail the main request if WebSocket setup fails
# Return SDP answer as plain text, just like the v1transceiver_test expects
return sdp_answer, 201, response_headers
except Exception as e:
error_msg = f"Error in SDP negotiation: {e}"
print(error_msg)
return jsonify({"error": error_msg}), 500
if __name__ == '__main__':
if not azure_resource:
print("Error: AZURE_RESOURCE environment variable is required")
exit(1)
print(f"Starting token service for Azure resource: {azure_resource}")
print("Using DefaultAzureCredential for authentication")
port = int(os.getenv('PORT', 5000))
print(f" gunicorn -w 4 -b 0.0.0.0:{port} --timeout 30 token-service:app")
此处显示了关联的浏览器更改。
html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Azure OpenAI Realtime Session - Connect Endpoint</title>
</head>
<body>
<h1>Azure OpenAI Realtime Session - Using /connect Endpoint</h1>
<button onclick="StartSession()">Start Session</button>
<!-- Log container for API messages -->
<div id="logContainer"></div>
<script>
const AZURE_RESOURCE = "YOUR AZURE RESOURCE NAME"
async function StartSession() {
try {
logMessage("🚀 Starting session with /connect endpoint...");
// Set up the WebRTC connection first
const peerConnection = new RTCPeerConnection();
logMessage("✅ RTCPeerConnection created");
// Get microphone access and add audio track BEFORE creating offer
logMessage("🎤 Requesting microphone access...");
try {
const clientMedia = await navigator.mediaDevices.getUserMedia({ audio: true });
logMessage("✅ Microphone access granted");
const audioTrack = clientMedia.getAudioTracks()[0];
logMessage("🎤 Audio track obtained: " + audioTrack.label);
peerConnection.addTrack(audioTrack);
logMessage("✅ Audio track added to peer connection");
} catch (error) {
logMessage("❌ Failed to get microphone access: " + error.message);
return;
}
// Set up audio playback
const audioElement = document.createElement('audio');
audioElement.autoplay = true;
document.body.appendChild(audioElement);
logMessage("🔊 Audio element created and added to page");
peerConnection.ontrack = (event) => {
logMessage("🎵 Remote track received! Type: " + event.track.kind);
logMessage("📊 Number of streams: " + event.streams.length);
if (event.streams.length > 0) {
audioElement.srcObject = event.streams[0];
logMessage("✅ Audio stream assigned to audio element");
// Add event listeners to audio element for debugging
audioElement.onloadstart = () => logMessage("🔄 Audio loading started");
audioElement.oncanplay = () => logMessage("▶️ Audio can start playing");
audioElement.onplay = () => logMessage("🎵 Audio playback started");
audioElement.onerror = (e) => logMessage("❌ Audio error: " + e.message);
} else {
logMessage("⚠️ No streams in track event");
}
};
// Set up data channel BEFORE SDP exchange
const dataChannel = peerConnection.createDataChannel('realtime-channel');
logMessage("📡 Data channel created");
dataChannel.addEventListener('open', () => {
logMessage('✅ Data channel is open - ready to send messages');
// Send client events to start the conversation
logMessage("📝 Preparing to send text input message...");
const event = {
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [
{
type: "input_text",
text: "hello there! Can you give me some vacation options?",
},
],
},
};
logMessage("📤 Sending conversation.item.create event...");
logMessage("💬 Text content: " + event.item.content[0].text);
try {
dataChannel.send(JSON.stringify(event));
logMessage("✅ Text input sent successfully!");
// Now send response.create to trigger the AI response
const responseEvent = {
type: "response.create"
};
logMessage("📤 Sending response.create event to trigger AI response...");
dataChannel.send(JSON.stringify(responseEvent));
logMessage("✅ Response.create sent successfully!");
} catch (error) {
logMessage("❌ Failed to send text input: " + error.message);
}
});
dataChannel.addEventListener('message', (event) => {
const realtimeEvent = JSON.parse(event.data);
console.log(realtimeEvent);
logMessage("📥 Received server event: " + realtimeEvent.type);
// Log more detail for important events
if (realtimeEvent.type === "error") {
logMessage("❌ Error: " + realtimeEvent.error.message);
} else if (realtimeEvent.type === "session.created") {
logMessage("🎉 Session created successfully");
} else if (realtimeEvent.type === "response.output_audio_transcript.done") {
logMessage("📝 AI transcript complete: " + (realtimeEvent.transcript || ""));
} else if (realtimeEvent.type === "response.done") {
logMessage("✅ Response completed");
}
});
dataChannel.addEventListener('close', () => {
logMessage('❌ Data channel is closed');
});
dataChannel.addEventListener('error', (error) => {
logMessage('❌ Data channel error: ' + error);
});
// Add connection state logging
peerConnection.onconnectionstatechange = () => {
logMessage("🔗 Connection state: " + peerConnection.connectionState);
};
peerConnection.oniceconnectionstatechange = () => {
logMessage("🧊 ICE connection state: " + peerConnection.iceConnectionState);
};
// Create offer AFTER setting up data channel
const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);
logMessage("🤝 WebRTC offer created with audio track");
// Prepare multipart form data for /connect endpoint
const formData = new FormData();
formData.append('sdp', offer.sdp);
logMessage("📤 Sending SDP via multipart form to /connect endpoint...");
// Call our /connect endpoint with multipart form data
const connectResponse = await fetch("/connect", {
method: "POST",
body: formData // FormData automatically sets correct Content-Type
});
if (!connectResponse.ok) {
throw new Error(`Connect service request failed: ${connectResponse.status}`);
}
// Get the SDP answer directly as text (not JSON)
const answerSdp = await connectResponse.text();
logMessage("✅ Got SDP answer from /connect endpoint, length: " + answerSdp.length + " chars");
// Set up the WebRTC connection using the SDP answer
const answer = { type: "answer", sdp: answerSdp };
await peerConnection.setRemoteDescription(answer);
logMessage("✅ Remote description set");
// Add close session button
const button = document.createElement('button');
button.innerText = 'Close Session';
button.onclick = () => stopSession(dataChannel, peerConnection);
document.body.appendChild(button);
logMessage("🔴 Close session button added");
function stopSession(dataChannel, peerConnection) {
if (dataChannel) dataChannel.close();
if (peerConnection) peerConnection.close();
logMessage("Session closed.");
}
} catch (error) {
console.error("Error in StartSession:", error);
logMessage("Error in StartSession: " + error.message);
}
}
function logMessage(message) {
const logContainer = document.getElementById("logContainer");
const p = document.createElement("p");
p.textContent = message;
logContainer.appendChild(p);
}
async function init(peerConnection) {
logMessage("� Continuing WebRTC setup with existing peer connection...");
// Set up to play remote audio from the model.
const audioElement = document.createElement('audio');
audioElement.autoplay = true;
document.body.appendChild(audioElement);
logMessage("🔊 Audio element created and added to page");
peerConnection.ontrack = (event) => {
logMessage("🎵 Remote track received! Type: " + event.track.kind);
logMessage("📊 Number of streams: " + event.streams.length);
if (event.streams.length > 0) {
audioElement.srcObject = event.streams[0];
logMessage("✅ Audio stream assigned to audio element");
// Add event listeners to audio element for debugging
audioElement.onloadstart = () => logMessage("🔄 Audio loading started");
audioElement.oncanplay = () => logMessage("▶️ Audio can start playing");
audioElement.onplay = () => logMessage("🎵 Audio playback started");
audioElement.onerror = (e) => logMessage("❌ Audio error: " + e.message);
} else {
logMessage("⚠️ No streams in track event");
}
};
const dataChannel = peerConnection.createDataChannel('realtime-channel');
logMessage("📡 Data channel created");
dataChannel.addEventListener('open', () => {
logMessage('✅ Data channel is open - ready to send messages');
// Send client events to start the conversation
logMessage("📝 Preparing to send text input message...");
const event = {
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [
{
type: "input_text",
text: "hello there! Can you give me some vacation options?",
},
],
},
};
logMessage("📤 Sending conversation.item.create event...");
logMessage("💬 Text content: " + event.item.content[0].text);
try {
dataChannel.send(JSON.stringify(event));
logMessage("✅ Text input sent successfully!");
// Now send response.create to trigger the AI response
const responseEvent = {
type: "response.create"
};
logMessage("📤 Sending response.create event to trigger AI response...");
dataChannel.send(JSON.stringify(responseEvent));
logMessage("✅ Response.create sent successfully!");
} catch (error) {
logMessage("❌ Failed to send text input: " + error.message);
}
});
dataChannel.addEventListener('close', () => {
logMessage('❌ Data channel is closed');
});
dataChannel.addEventListener('error', (error) => {
logMessage('❌ Data channel error: ' + error);
}); // Add connection state logging
peerConnection.onconnectionstatechange = () => {
logMessage("� Connection state: " + peerConnection.connectionState);
};
peerConnection.oniceconnectionstatechange = () => {
logMessage("🧊 ICE connection state: " + peerConnection.iceConnectionState);
};
// Add close session button
const button = document.createElement('button');
button.innerText = 'Close Session';
button.onclick = () => stopSession(dataChannel, peerConnection);
document.body.appendChild(button);
logMessage("🔴 Close session button added");
function stopSession(dataChannel, peerConnection) {
if (dataChannel) dataChannel.close();
if (peerConnection) peerConnection.close();
logMessage("Session closed.");
}
}
function logMessage(message) {
const logContainer = document.getElementById("logContainer");
const p = document.createElement("p");
p.textContent = message;
logContainer.appendChild(p);
}
</script>
</body>
</html>