Cómo transmitir respuestas del agente

¿Qué es una respuesta por streaming?

Una respuesta transmitida entrega el contenido del mensaje en fragmentos pequeños e incrementales. Este enfoque mejora la experiencia del usuario al permitirle ver y interactuar con el mensaje a medida que se desarrolla, en lugar de esperar a que se cargue toda la respuesta. Los usuarios pueden comenzar a procesar la información inmediatamente, mejorando la sensación de capacidad de respuesta e interactividad. Como resultado, minimiza los retrasos y mantiene a los usuarios más comprometidos durante el proceso de comunicación.

Referencias de streaming

Streaming en Kernel Semántico

Los servicios de IA que admiten streaming en kernel semántico usan tipos de contenido diferentes en comparación con los usados para los mensajes totalmente formados. Estos tipos de contenido están diseñados específicamente para controlar la naturaleza incremental de los datos de streaming. Los mismos tipos de contenido también se usan dentro del marco del agente para fines similares. Esto garantiza la coherencia y la eficacia en ambos sistemas cuando se trabaja con la información de streaming.

Sugerencia

Referencia de API:

Sugerencia

Referencia de API:

Característica actualmente no disponible en Java.

Respuesta transmitida de `ChatCompletionAgent`

Al invocar una respuesta transmitida desde un ChatCompletionAgent, el ChatHistory del AgentThread se actualiza después de recibir la respuesta completa. Aunque la respuesta se transmite de forma incremental, el historial registra solo el mensaje completo. Esto garantiza que el ChatHistory refleje respuestas completamente formadas para asegurar la coherencia.

// Define agent
ChatCompletionAgent agent = ...;

ChatHistoryAgentThread agentThread = new();

// Create a user message
var message = ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's also possible to read the messages that were added to the ChatHistoryAgentThread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

from semantic_kernel.agents import ChatCompletionAgent, ChatHistoryAgentThread

# Define agent
agent = ChatCompletionAgent(...)

# Create a thread object to maintain the conversation state.
# If no thread is provided one will be created and returned with
# the initial response.
thread: ChatHistoryAgentThread = None

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread)
{
  # Process streamed response(s)...
  thread = response.thread
}

Característica actualmente no disponible en Java.

Respuesta transmitida de `OpenAIAssistantAgent`

Al invocar una respuesta en streaming desde un OpenAIAssistantAgent, el asistente mantiene el estado de conversación como un hilo remoto. Es posible leer los mensajes del subproceso remoto si es necesario.

// Define agent
OpenAIAssistantAgent agent = ...;

// Create a thread for the agent conversation.
OpenAIAssistantAgentThread agentThread = new(assistantClient);

// Create a user message
var message = new ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's possible to read the messages from the remote thread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

// Delete the thread when it is no longer needed
await agentThread.DeleteAsync();

Para crear un subproceso utilizando un Idexistente, páselo al constructor de OpenAIAssistantAgentThread.

// Define agent
OpenAIAssistantAgent agent = ...;

// Create a thread for the agent conversation.
OpenAIAssistantAgentThread agentThread = new(assistantClient, "your-existing-thread-id");

// Create a user message
var message = new ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's possible to read the messages from the remote thread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

// Delete the thread when it is no longer needed
await agentThread.DeleteAsync();

from semantic_kernel.agents import AssistantAgentThread, AzureAssistantAgent, OpenAIAssistantAgent

# Define agent
agent = OpenAIAssistantAgent(...)  # or = AzureAssistantAgent(...)

# Create a thread for the agent conversation.
# If no thread is provided one will be created and returned with
# the initial response.
thread: AssistantAgentThread = None

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread):
  # Process streamed response(s)...
  thread = response.thread

# Read the messages from the remote thread
async for response in thread.get_messages():
  # Process messages

# Delete the thread
await thread.delete()

Para crear un subproceso utilizando un thread_idexistente, páselo al constructor de AssistantAgentThread.

from semantic_kernel.agents import AssistantAgentThread, AzureAssistantAgent, OpenAIAssistantAgent

# Define agent
agent = OpenAIAssistantAgent(...)  # or = AzureAssistantAgent(...)

# Create a thread for the agent conversation.
# If no thread is provided one will be created and returned with
# the initial response.
thread = AssistantAgentThread(client=client, thread_id="your-existing-thread-id")

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread):
  # Process streamed response(s)...
  thread = response.thread

# Delete the thread
await thread.delete()

Característica actualmente no disponible en Java.

Control de mensajes intermedios con una respuesta de streaming

La naturaleza de las respuestas de streaming permite a los modelos LLM devolver fragmentos incrementales de texto, lo que permite una representación más rápida en una interfaz de usuario o consola sin esperar a que se complete toda la respuesta. Además, es posible que una persona que realiza la llamada quiera manejar el contenido intermedio, como los resultados de las llamadas de función. Esto se puede lograr al proporcionar una función de devolución de llamada cuando se invoca la respuesta de streaming. La función de devolución de llamada recibe mensajes completos encapsulados dentro de ChatMessageContent.

La documentación del callback de AzureAIAgent estará disponible pronto.

La configuración de la on_intermediate_message devolución de llamada en agent.invoke_stream(...) permite al llamante recibir mensajes intermedios generados durante el proceso de la formulación de la respuesta final del agente.

import asyncio
from typing import Annotated

from semantic_kernel.agents import AzureResponsesAgent
from semantic_kernel.contents import ChatMessageContent, FunctionCallContent, FunctionResultContent
from semantic_kernel.functions import kernel_function


# Define a sample plugin for the sample
class MenuPlugin:
    """A sample Menu Plugin used for the concept sample."""

    @kernel_function(description="Provides a list of specials from the menu.")
    def get_specials(self, menu_item: str) -> Annotated[str, "Returns the specials from the menu."]:
        return """
        Special Soup: Clam Chowder
        Special Salad: Cobb Salad
        Special Drink: Chai Tea
        """

    @kernel_function(description="Provides the price of the requested menu item.")
    def get_item_price(
        self, menu_item: Annotated[str, "The name of the menu item."]
    ) -> Annotated[str, "Returns the price of the menu item."]:
        return "$9.99"

# This callback function will be called for each intermediate message,
# which will allow one to handle FunctionCallContent and FunctionResultContent.
# If the callback is not provided, the agent will return the final response
# with no intermediate tool call steps.
async def handle_streaming_intermediate_steps(message: ChatMessageContent) -> None:
    for item in message.items or []:
        if isinstance(item, FunctionResultContent):
            print(f"Function Result:> {item.result} for function: {item.name}")
        elif isinstance(item, FunctionCallContent):
            print(f"Function Call:> {item.name} with arguments: {item.arguments}")
        else:
            print(f"{item}")

# Simulate a conversation with the agent
USER_INPUTS = [
    "Hello",
    "What is the special soup?",
    "What is the special drink?",
    "How much is it?",
    "Thank you",
]


async def main():
    # 1. Create the client using OpenAI resources and configuration
    client, model = AzureResponsesAgent.setup_resources()

    # 2. Create a Semantic Kernel agent for the OpenAI Responses API
    agent = AzureResponsesAgent(
        ai_model_id=model,
        client=client,
        instructions="Answer questions about the menu.",
        name="Host",
        plugins=[MenuPlugin()],
    )

    # 3. Create a thread for the agent
    # If no thread is provided, a new thread will be
    # created and returned with the initial response
    thread = None

    try:
        for user_input in user_inputs:
            print(f"# {AuthorRole.USER}: '{user_input}'")

            first_chunk = True
            async for response in agent.invoke_stream(
                messages=user_input,
                thread=thread,
                on_intermediate_message=handle_streaming_intermediate_steps,
            ):
                thread = response.thread
                if first_chunk:
                    print(f"# {response.name}: ", end="", flush=True)
                    first_chunk = False
                print(response.content, end="", flush=True)
            print()
    finally:
        await thread.delete() if thread else None

if __name__ == "__main__":
    asyncio.run(main())

A continuación se muestra la salida de ejemplo del proceso de invocación del agente:

Sample Output:

# AuthorRole.USER: 'Hello'
# Host: Hello! How can I assist you with the menu today?
# AuthorRole.USER: 'What is the special soup?'
Function Call:> MenuPlugin-get_specials with arguments: {}
Function Result:>
        Special Soup: Clam Chowder
        Special Salad: Cobb Salad
        Special Drink: Chai Tea
        for function: MenuPlugin-get_specials
# Host: The special soup today is Clam Chowder. Would you like to know more about it or hear about other specials?
# AuthorRole.USER: 'What is the special drink?'
# Host: The special drink today is Chai Tea. Would you like more details or are you interested in ordering it?
# AuthorRole.USER: 'How much is that?'
Function Call:> MenuPlugin-get_item_price with arguments: {"menu_item":"Chai Tea"}
Function Result:> $9.99 for function: MenuPlugin-get_item_price
# Host: The special drink, Chai Tea, is $9.99. Would you like to order one or need information on something else?
# AuthorRole.USER: 'Thank you'
# Host: You're welcome! If you have any more questions or need help with the menu, just let me know. Enjoy your day!

Característica actualmente no disponible en Java.

Pasos siguientes

Uso de plantillas con agentes

Orquestación de agentes

Last updated on 2025-05-23

Compartir a través de

Cómo transmitir respuestas del agente

¿Qué es una respuesta por streaming?

Referencias de streaming

Streaming en Kernel Semántico

Respuesta transmitida de ChatCompletionAgent

Respuesta transmitida de OpenAIAssistantAgent

Control de mensajes intermedios con una respuesta de streaming

Pasos siguientes

Recursos adicionales

Respuesta transmitida de `ChatCompletionAgent`

Respuesta transmitida de `OpenAIAssistantAgent`