Promblem with GPT-Realtime on front end

Carvalho, Caio 0 Reputation points
2025-12-04T13:47:39.1666667+00:00

Real-time GPT/LLM Integration in Frontend (Next.js/TypeScript) - WebSocket/WebRTC and Ephemeral Keys

Hello Azure Community, I am attempting to run a Real-time GPT (or a real-time stream from an Azure OpenAI Service LLM) directly from my web frontend, which is built using Next.js and TypeScript. I've explored direct connections using WebSockets and WebRTC, but neither approach has been successful in establishing a persistent, secure, and authenticated real-time stream. My current suspicion is that the failure is related to the lack of a mechanism for providing Ephemeral Keys or short-lived access tokens that can be safely used in a client-side environment for this kind of real-time service authentication, without exposing sensitive, long-term API keys.

  1. Is the direct, authenticated, real-time streaming of Azure OpenAI/GPT to a web frontend a supported scenario?
  2. Does anyone have an example implementation (preferably in Next.js/TypeScript) of how to successfully achieve this using a recommended Azure service/pattern?

Any guidance or examples on securely managing the frontend authentication for real-time LLM services would be highly appreciated. Thank you!

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Q&A Assist
    2025-12-04T13:47:53.0766667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Yes, direct, authenticated real-time streaming of Azure OpenAI/GPT to a web frontend is a supported scenario. You can utilize the Realtime API via WebRTC or WebSockets for this purpose. The Realtime API allows for low-latency, "speech in, speech out" conversational interactions, which is suitable for your use case.

    To securely manage authentication in a client-side environment, you can use ephemeral tokens. These tokens can be generated through a REST API call to your Azure OpenAI resource, which can be authenticated using either an API key or Microsoft Entra ID token. This approach helps to avoid exposing sensitive, long-term API keys in your frontend application.

    Here’s a general outline of the steps you can follow:

    1. Set up a token service: Create a service that retrieves ephemeral tokens using the Azure OpenAI REST API. This service will authenticate using an API key or Microsoft Entra ID token.
    2. Integrate with your frontend: Your Next.js application can call this token service to obtain the ephemeral token and then use it to establish a WebRTC or WebSocket connection to the Realtime API.
    3. Implement the connection: Use the ephemeral token to authenticate the WebRTC or WebSocket connection, ensuring that you handle the streaming of audio and text appropriately.

    While I cannot provide a specific example implementation in Next.js/TypeScript, following the steps outlined in the Azure documentation for setting up the Realtime API and managing ephemeral tokens will guide you in the right direction.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.