에이전트 응답을 스트리밍하는 방법

스트리밍된 응답이란?

스트리밍된 응답은 메시지 콘텐츠를 작은 증분 청크로 전달합니다. 이 방법은 전체 응답이 로드되기를 기다리지 않고 메시지를 보고 상호작용할 수 있도록 하여 사용자 경험을 향상시킵니다. 사용자는 즉시 정보 처리를 시작하여 응답성 및 대화형 작업을 개선할 수 있습니다. 결과적으로 지연을 최소화하고 통신 프로세스 전반에 걸쳐 사용자가 더 많은 참여를 유지합니다.

스트리밍 참조

의미론 커널에서의 스트리밍

의미 체계 커널에서 스트리밍을 지원하는 AI 서비스는 완전히 구성된 메시지에 사용되는 콘텐츠 형식과 비교하여 다양한 콘텐츠 형식을 사용합니다. 이러한 콘텐츠 형식은 스트리밍 데이터의 증분 특성을 처리하도록 특별히 설계되었습니다. 비슷한 용도로 에이전트 프레임워크 내에서도 동일한 콘텐츠 형식이 사용됩니다. 이렇게 하면 스트리밍 정보를 처리할 때 두 시스템 간에 일관성과 효율성이 보장됩니다.

팁 (조언)

API 참조:

팁 (조언)

API 참조:

현재 Java에서 기능을 사용할 수 없습니다.

에서 스트리밍된 응답 `ChatCompletionAgent`

스트리밍된 응답을 ChatCompletionAgent에서 호출하면, 전체 응답을 받은 후에 ChatHistory의 AgentThread가 업데이트됩니다. 응답은 증분 방식으로 스트리밍되지만 기록은 전체 메시지만 기록합니다. 이렇게 하면 일관성을 유지하기 위해 응답이 완전히 형성되도록 ChatHistory에 반영됩니다.

// Define agent
ChatCompletionAgent agent = ...;

ChatHistoryAgentThread agentThread = new();

// Create a user message
var message = ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's also possible to read the messages that were added to the ChatHistoryAgentThread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

from semantic_kernel.agents import ChatCompletionAgent, ChatHistoryAgentThread

# Define agent
agent = ChatCompletionAgent(...)

# Create a thread object to maintain the conversation state.
# If no thread is provided one will be created and returned with
# the initial response.
thread: ChatHistoryAgentThread = None

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread)
{
  # Process streamed response(s)...
  thread = response.thread
}

현재 Java에서 기능을 사용할 수 없습니다.

에서 스트리밍된 응답 `OpenAIAssistantAgent`

도우미는 스트리밍된 응답을 OpenAIAssistantAgent호출할 때 대화 상태를 원격 스레드로 유지 관리합니다. 필요한 경우 원격 스레드에서 메시지를 읽을 수 있습니다.

// Define agent
OpenAIAssistantAgent agent = ...;

// Create a thread for the agent conversation.
OpenAIAssistantAgentThread agentThread = new(assistantClient);

// Create a user message
var message = new ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's possible to read the messages from the remote thread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

// Delete the thread when it is no longer needed
await agentThread.DeleteAsync();

기존 Id를 사용하여 새 스레드를 만들려면 OpenAIAssistantAgentThread의 생성자에 전달해야 합니다.

// Define agent
OpenAIAssistantAgent agent = ...;

// Create a thread for the agent conversation.
OpenAIAssistantAgentThread agentThread = new(assistantClient, "your-existing-thread-id");

// Create a user message
var message = new ChatMessageContent(AuthorRole.User, "<user input>");

// Generate the streamed agent response(s)
await foreach (StreamingChatMessageContent response in agent.InvokeStreamingAsync(message, agentThread))
{
  // Process streamed response(s)...
}

// It's possible to read the messages from the remote thread.
await foreach (ChatMessageContent response in agentThread.GetMessagesAsync())
{
  // Process messages...
}

// Delete the thread when it is no longer needed
await agentThread.DeleteAsync();

from semantic_kernel.agents import AssistantAgentThread, AzureAssistantAgent, OpenAIAssistantAgent

# Define agent
agent = OpenAIAssistantAgent(...)  # or = AzureAssistantAgent(...)

# Create a thread for the agent conversation.
# If no thread is provided one will be created and returned with
# the initial response.
thread: AssistantAgentThread = None

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread):
  # Process streamed response(s)...
  thread = response.thread

# Read the messages from the remote thread
async for response in thread.get_messages():
  # Process messages

# Delete the thread
await thread.delete()

기존 thread_id를 사용하여 새 스레드를 만들려면 AssistantAgentThread의 생성자에 전달해야 합니다.

from semantic_kernel.agents import AssistantAgentThread, AzureAssistantAgent, OpenAIAssistantAgent

# Define agent
agent = OpenAIAssistantAgent(...)  # or = AzureAssistantAgent(...)

# Create a thread for the agent conversation.
# If no thread is provided one will be created and returned with
# the initial response.
thread = AssistantAgentThread(client=client, thread_id="your-existing-thread-id")

# Generate the streamed agent response(s)
async for response in agent.invoke_stream(messages="user input", thread=thread):
  # Process streamed response(s)...
  thread = response.thread

# Delete the thread
await thread.delete()

현재 Java에서 기능을 사용할 수 없습니다.

스트리밍 응답을 사용하여 중간 메시지 처리

스트리밍 응답의 특성을 통해 LLM 모델은 증분 텍스트 청크를 반환할 수 있으므로 전체 응답이 완료되는 것을 기다리지 않고 UI 또는 콘솔에서 더 빠르게 렌더링할 수 있습니다. 또한 호출자는 함수 호출의 결과와 같은 중간 콘텐츠를 처리하려고 할 수 있습니다. 스트리밍 응답을 호출할 때 콜백 함수를 제공하여 이 작업을 수행할 수 있습니다. 콜백 함수는 에 캡슐화된 ChatMessageContent전체 메시지를 받습니다.

AzureAIAgent에 대한 콜백 설명서는 곧 제공될 예정입니다.

콜백 on_intermediate_message을 agent.invoke_stream(...) 내에서 구성하면 호출자가 에이전트의 최종 응답을 작성하는 동안 생성된 중간 메시지를 받을 수 있습니다.

import asyncio
from typing import Annotated

from semantic_kernel.agents import AzureResponsesAgent
from semantic_kernel.contents import ChatMessageContent, FunctionCallContent, FunctionResultContent
from semantic_kernel.functions import kernel_function


# Define a sample plugin for the sample
class MenuPlugin:
    """A sample Menu Plugin used for the concept sample."""

    @kernel_function(description="Provides a list of specials from the menu.")
    def get_specials(self, menu_item: str) -> Annotated[str, "Returns the specials from the menu."]:
        return """
        Special Soup: Clam Chowder
        Special Salad: Cobb Salad
        Special Drink: Chai Tea
        """

    @kernel_function(description="Provides the price of the requested menu item.")
    def get_item_price(
        self, menu_item: Annotated[str, "The name of the menu item."]
    ) -> Annotated[str, "Returns the price of the menu item."]:
        return "$9.99"

# This callback function will be called for each intermediate message,
# which will allow one to handle FunctionCallContent and FunctionResultContent.
# If the callback is not provided, the agent will return the final response
# with no intermediate tool call steps.
async def handle_streaming_intermediate_steps(message: ChatMessageContent) -> None:
    for item in message.items or []:
        if isinstance(item, FunctionResultContent):
            print(f"Function Result:> {item.result} for function: {item.name}")
        elif isinstance(item, FunctionCallContent):
            print(f"Function Call:> {item.name} with arguments: {item.arguments}")
        else:
            print(f"{item}")

# Simulate a conversation with the agent
USER_INPUTS = [
    "Hello",
    "What is the special soup?",
    "What is the special drink?",
    "How much is it?",
    "Thank you",
]


async def main():
    # 1. Create the client using OpenAI resources and configuration
    client, model = AzureResponsesAgent.setup_resources()

    # 2. Create a Semantic Kernel agent for the OpenAI Responses API
    agent = AzureResponsesAgent(
        ai_model_id=model,
        client=client,
        instructions="Answer questions about the menu.",
        name="Host",
        plugins=[MenuPlugin()],
    )

    # 3. Create a thread for the agent
    # If no thread is provided, a new thread will be
    # created and returned with the initial response
    thread = None

    try:
        for user_input in user_inputs:
            print(f"# {AuthorRole.USER}: '{user_input}'")

            first_chunk = True
            async for response in agent.invoke_stream(
                messages=user_input,
                thread=thread,
                on_intermediate_message=handle_streaming_intermediate_steps,
            ):
                thread = response.thread
                if first_chunk:
                    print(f"# {response.name}: ", end="", flush=True)
                    first_chunk = False
                print(response.content, end="", flush=True)
            print()
    finally:
        await thread.delete() if thread else None

if __name__ == "__main__":
    asyncio.run(main())

다음은 에이전트 호출 프로세스의 샘플 출력을 보여 줍니다.

Sample Output:

# AuthorRole.USER: 'Hello'
# Host: Hello! How can I assist you with the menu today?
# AuthorRole.USER: 'What is the special soup?'
Function Call:> MenuPlugin-get_specials with arguments: {}
Function Result:>
        Special Soup: Clam Chowder
        Special Salad: Cobb Salad
        Special Drink: Chai Tea
        for function: MenuPlugin-get_specials
# Host: The special soup today is Clam Chowder. Would you like to know more about it or hear about other specials?
# AuthorRole.USER: 'What is the special drink?'
# Host: The special drink today is Chai Tea. Would you like more details or are you interested in ordering it?
# AuthorRole.USER: 'How much is that?'
Function Call:> MenuPlugin-get_item_price with arguments: {"menu_item":"Chai Tea"}
Function Result:> $9.99 for function: MenuPlugin-get_item_price
# Host: The special drink, Chai Tea, is $9.99. Would you like to order one or need information on something else?
# AuthorRole.USER: 'Thank you'
# Host: You're welcome! If you have any more questions or need help with the menu, just let me know. Enjoy your day!

현재 Java에서 기능을 사용할 수 없습니다.

다음 단계

에이전트와 함께 템플릿 사용

에이전트 오케스트레이션

Last updated on 2025-05-23

다음을 통해 공유

에이전트 응답을 스트리밍하는 방법

스트리밍된 응답이란?

스트리밍 참조

의미론 커널에서의 스트리밍

에서 스트리밍된 응답 ChatCompletionAgent

에서 스트리밍된 응답 OpenAIAssistantAgent

스트리밍 응답을 사용하여 중간 메시지 처리

다음 단계

추가 리소스

에서 스트리밍된 응답 `ChatCompletionAgent`

에서 스트리밍된 응답 `OpenAIAssistantAgent`