Share via


VoiceLiveModelFactory Class

Definition

A factory class for creating instances of the models for mocking.

public static class VoiceLiveModelFactory
type VoiceLiveModelFactory = class
Public Class VoiceLiveModelFactory
Inheritance
VoiceLiveModelFactory

Methods

Name Description
AnimationOptions(String, IEnumerable<AnimationOutputType>)

Configuration for animation outputs including blendshapes and visemes metadata.

AssistantMessageItem(String, IEnumerable<MessageContentPart>, Nullable<ItemParamStatus>)

An assistant message item within a conversation.

AudioEchoCancellation()

Echo cancellation configuration for server-side audio processing.

AudioInputTranscriptionOptions(AudioInputTranscriptionOptionsModel, String, IDictionary<String,String>, IEnumerable<String>)

Configuration for input audio transcription.

AudioNoiseReduction(AudioNoiseReductionType)

Configuration for input audio noise reduction.

AvatarConfiguration(IEnumerable<IceServer>, String, String, Boolean, VideoParams)

Configuration for avatar streaming and behavior during the session.

AzureCustomVoice(String, String, Nullable<Single>, String, IEnumerable<String>, String, String, String, String, String)

Azure custom voice configuration.

AzurePersonalVoice(String, Nullable<Single>, PersonalVoiceModels)

Azure personal voice configuration.

AzureSemanticEouDetection(Nullable<EouThresholdLevel>, Nullable<Single>)

Azure semantic end-of-utterance detection (default).

AzureSemanticEouDetectionEn(Nullable<EouThresholdLevel>, Nullable<Single>)

Azure semantic end-of-utterance detection (English-optimized).

AzureSemanticEouDetectionMultilingual(Nullable<EouThresholdLevel>, Nullable<Single>)

Azure semantic end-of-utterance detection (multilingual).

AzureSemanticVadTurnDetection(Nullable<Single>, Nullable<Int32>, Nullable<Int32>, EouDetection, Nullable<Int32>, Nullable<Boolean>, IEnumerable<String>, Nullable<Boolean>, Nullable<Boolean>, Nullable<Boolean>)

Server Speech Detection (Azure semantic VAD, default variant).

AzureSemanticVadTurnDetectionEn(Nullable<Single>, Nullable<Int32>, Nullable<Int32>, EouDetection, Nullable<Int32>, Nullable<Boolean>, Nullable<Boolean>, Nullable<Boolean>, Nullable<Boolean>)

Server Speech Detection (Azure semantic VAD, English-only).

AzureSemanticVadTurnDetectionMultilingual(Nullable<Single>, Nullable<Int32>, Nullable<Int32>, EouDetection, Nullable<Int32>, Nullable<Boolean>, IEnumerable<String>, Nullable<Boolean>, Nullable<Boolean>, Nullable<Boolean>)

Server Speech Detection (Azure semantic VAD).

AzureStandardVoice(String, Nullable<Single>, String, IEnumerable<String>, String, String, String, String, String)

Azure standard voice configuration.

AzureVoice(String)

Base for Azure voice configurations. Please note this is the abstract base class. The derived classes available for instantiation are: AzureCustomVoice, AzureStandardVoice, and AzurePersonalVoice.

CachedTokenDetails(Int32, Int32)

Details of output token usage.

ConversationRequestItem(String, String)

Base for any response item; discriminated by type. Please note this is the abstract base class. The derived classes available for instantiation are: MessageItem, FunctionCallItem, and FunctionCallOutputItem.

EouDetection(String)

Top-level union for end-of-utterance (EOU) semantic detection configuration. Please note this is the abstract base class. The derived classes available for instantiation are: AzureSemanticEouDetection, AzureSemanticEouDetectionEn, and AzureSemanticEouDetectionMultilingual.

FunctionCallItem(String, String, String, String, Nullable<ItemParamStatus>)

A function call item within a conversation.

FunctionCallOutputItem(String, String, String, Nullable<ItemParamStatus>)

A function call output item within a conversation.

IceServer(IEnumerable<Uri>, String, String)

ICE server configuration for WebRTC connection negotiation.

InputAudioContentPart(String, String)

Input audio content part.

InputTextContentPart(String)

Input text content part.

InputTokenDetails(Int32, Int32, Int32, CachedTokenDetails)

Details of input token usage.

LogProbProperties(String, Single, BinaryData)

A single log probability entry for a token.

MessageContentPart(String)

Base for any message content part; discriminated by type. Please note this is the abstract base class. The derived classes available for instantiation are: InputTextContentPart, InputAudioContentPart, and OutputTextContentPart.

MessageItem(String, IEnumerable<MessageContentPart>, Nullable<ItemParamStatus>)

A message item within a conversation.

OpenAIVoice(OAIVoice)

OpenAI voice configuration with explicit type field.

This provides a unified interface for OpenAI voices, complementing the existing string-based OAIVoice for backward compatibility.

OutputTextContentPart(String)

Output text content part.

OutputTokenDetails(Int32, Int32)

Details of output token usage.

RequestAudioContentPart(String)

An audio content part for a request.

RequestTextContentPart(String)

A text content part for a request.

ResponseAudioContentPart(String)

An audio content part for a response.

ResponseCancelledDetails(ResponseCancelledDetailsReason)

Details for a cancelled response.

ResponseFailedDetails(BinaryData)

Details for a failed response.

ResponseFunctionCallItem(String, String, String, String, String, SessionResponseItemStatus)

A function call item within a conversation.

ResponseFunctionCallOutputItem(String, String, String, String)

A function call output item within a conversation.

ResponseIncompleteDetails(ResponseIncompleteDetailsReason)

Details for an incomplete response.

ResponseStatusDetails(String)

Base for all non-success response details. Please note this is the abstract base class. The derived classes available for instantiation are: ResponseCancelledDetails, ResponseIncompleteDetails, and ResponseFailedDetails.

ResponseTextContentPart(String)

A text content part for a response.

ResponseTokenStatistics(Int32, Int32, Int32, InputTokenDetails, OutputTokenDetails)

Overall usage statistics for a response.

ServerVadTurnDetection(Nullable<Single>, Nullable<Int32>, Nullable<Int32>, EouDetection, Nullable<Boolean>, Nullable<Boolean>, Nullable<Boolean>)

Base model for VAD-based turn detection.

SessionResponse(String, String, Nullable<SessionResponseStatus>, ResponseStatusDetails, IEnumerable<SessionResponseItem>, ResponseTokenStatistics, String, VoiceProvider, IEnumerable<InteractionModality>, Nullable<OutputAudioFormat>, Nullable<Single>, MaxResponseOutputTokensOption)

The response resource.

SessionResponseItem(String, String, String)

Base for any response item; discriminated by type. Please note this is the abstract base class. The derived classes available for instantiation are: SessionResponseMessageItem, ResponseFunctionCallItem, and ResponseFunctionCallOutputItem.

SessionResponseMessageItem(String, String, ResponseMessageRole, IEnumerable<VoiceLiveContentPart>, SessionResponseItemStatus)

Base type for message item within a conversation.

SessionUpdate(String, String)

A voicelive server event. Please note this is the abstract base class. The derived classes available for instantiation are: SessionUpdateError, SessionUpdateSessionCreated, SessionUpdateSessionUpdated, SessionUpdateAvatarConnecting, SessionUpdateInputAudioBufferCommitted, SessionUpdateInputAudioBufferCleared, SessionUpdateInputAudioBufferSpeechStarted, SessionUpdateInputAudioBufferSpeechStopped, SessionUpdateConversationItemCreated, SessionUpdateConversationItemInputAudioTranscriptionCompleted, SessionUpdateConversationItemInputAudioTranscriptionFailed, SessionUpdateConversationItemTruncated, SessionUpdateConversationItemDeleted, SessionUpdateResponseCreated, SessionUpdateResponseDone, SessionUpdateResponseOutputItemAdded, SessionUpdateResponseOutputItemDone, SessionUpdateResponseContentPartAdded, SessionUpdateResponseContentPartDone, SessionUpdateResponseTextDelta, SessionUpdateResponseTextDone, SessionUpdateResponseAudioTranscriptDelta, SessionUpdateResponseAudioTranscriptDone, SessionUpdateResponseAudioDelta, SessionUpdateResponseAudioDone, SessionUpdateResponseAnimationBlendshapeDelta, SessionUpdateResponseAnimationBlendshapeDone, SessionUpdateResponseAudioTimestampDelta, SessionUpdateResponseAudioTimestampDone, SessionUpdateResponseAnimationVisemeDelta, SessionUpdateResponseAnimationVisemeDone, SessionUpdateConversationItemInputAudioTranscriptionDelta, SessionUpdateConversationItemRetrieved, SessionUpdateResponseFunctionCallArgumentsDelta, and SessionUpdateResponseFunctionCallArgumentsDone.

SessionUpdateAvatarConnecting(String, String)

Sent when the server is in the process of establishing an avatar media connection and provides its SDP answer.

SessionUpdateConversationItemCreated(String, String, SessionResponseItem)

Returned when a conversation item is created. There are several scenarios that produce this event:

  • The server is generating a Response, which if successful will produce either one or two Items, which will be of type message (role assistant) or type function_call.
  • The input audio buffer has been committed, either by the client or the server (in server_vad mode). The server will take the content of the input audio buffer and add it to a new user message Item.
  • The client has sent a conversation.item.create event to add a new Item to the Conversation.
SessionUpdateConversationItemDeleted(String, String)

Returned when an item in the conversation is deleted by the client with a conversation.item.delete event. This event is used to synchronize the server's understanding of the conversation history with the client's view.

SessionUpdateConversationItemInputAudioTranscriptionCompleted(String, String, Int32, String)

This event is the output of audio transcription for user audio written to the user audio buffer. Transcription begins when the input audio buffer is committed by the client or server (in server_vad mode). Transcription runs asynchronously with Response creation, so this event may come before or after the Response events. VoiceLive API models accept audio natively, and thus input transcription is a separate process run on a separate ASR (Automatic Speech Recognition) model. The transcript may diverge somewhat from the model's interpretation, and should be treated as a rough guide.

SessionUpdateConversationItemInputAudioTranscriptionDelta(String, String, Nullable<Int32>, String, IEnumerable<LogProbProperties>)

Returned when the text value of an input audio transcription content part is updated.

SessionUpdateConversationItemInputAudioTranscriptionFailed(String, String, Int32, VoiceLiveErrorDetails)

Returned when input audio transcription is configured, and a transcription request for a user message failed. These events are separate from other error events so that the client can identify the related Item.

SessionUpdateConversationItemRetrieved(SessionResponseItem, String)

Returned when a conversation item is retrieved with conversation.item.retrieve.

SessionUpdateConversationItemTruncated(String, Int32, Int32, String)

Returned when an earlier assistant audio message item is truncated by the client with a conversation.item.truncate event. This event is used to synchronize the server's understanding of the audio with the client's playback. This action will truncate the audio and remove the server-side text transcript to ensure there is no text in the context that hasn't been heard by the user.

SessionUpdateError(String, SessionUpdateErrorDetails)

Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default.

SessionUpdateErrorDetails(String, String, String, String, String)

Details of the error.

SessionUpdateInputAudioBufferCleared(String)

Returned when the input audio buffer is cleared by the client with a input_audio_buffer.clear event.

SessionUpdateInputAudioBufferCommitted(String, String, String)

Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. The item_id property is the ID of the user message item that will be created, thus a conversation.item.created event will also be sent to the client.

SessionUpdateInputAudioBufferSpeechStarted(String, Int32, String)

Sent by the server when in server_vad mode to indicate that speech has been detected in the audio buffer. This can happen any time audio is added to the buffer (unless speech is already detected). The client may want to use this event to interrupt audio playback or provide visual feedback to the user. The client should expect to receive a input_audio_buffer.speech_stopped event when speech stops. The item_id property is the ID of the user message item that will be created when speech stops and will also be included in the input_audio_buffer.speech_stopped event (unless the client manually commits the audio buffer during VAD activation).

SessionUpdateInputAudioBufferSpeechStopped(String, Int32, String)

Returned in server_vad mode when the server detects the end of speech in the audio buffer. The server will also send an conversation.item.created event with the user message item that is created from the audio buffer.

SessionUpdateResponseAnimationBlendshapeDelta(String, String, String, Int32, Int32, BinaryData, Int32)

Represents a delta update of blendshape animation frames for a specific output of a response.

SessionUpdateResponseAnimationBlendshapeDone(String, String, String, Int32)

Indicates the completion of blendshape animation processing for a specific output of a response.

SessionUpdateResponseAnimationVisemeDelta(String, String, String, Int32, Int32, Int32, Int32)

Represents a viseme ID delta update for animation based on audio.

SessionUpdateResponseAnimationVisemeDone(String, String, String, Int32, Int32)

Indicates completion of viseme animation delivery for a response.

SessionUpdateResponseAudioDelta(String, String, String, Int32, Int32, BinaryData)

Returned when the model-generated audio is updated.

SessionUpdateResponseAudioDone(String, String, String, Int32, Int32)

Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled.

SessionUpdateResponseAudioTimestampDelta(String, String, String, Int32, Int32, Int32, Int32, String)

Represents a word-level audio timestamp delta for a response.

SessionUpdateResponseAudioTimestampDone(String, String, String, Int32, Int32)

Indicates completion of audio timestamp delivery for a response.

SessionUpdateResponseAudioTranscriptDelta(String, String, String, Int32, Int32, String)

Returned when the model-generated transcription of audio output is updated.

SessionUpdateResponseAudioTranscriptDone(String, String, String, Int32, Int32, String)

Returned when the model-generated transcription of audio output is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.

SessionUpdateResponseContentPartAdded(String, String, String, Int32, Int32, VoiceLiveContentPart)

Returned when a new content part is added to an assistant message item during response generation.

SessionUpdateResponseContentPartDone(String, String, String, Int32, Int32, VoiceLiveContentPart)

Returned when a content part is done streaming in an assistant message item. Also emitted when a Response is interrupted, incomplete, or cancelled.

SessionUpdateResponseCreated(String, SessionResponse)

Returned when a new Response is created. The first event of response creation, where the response is in an initial state of in_progress.

SessionUpdateResponseDone(String, SessionResponse)

Returned when a Response is done streaming. Always emitted, no matter the final state. The Response object included in the response.done event will include all output Items in the Response but will omit the raw audio data.

SessionUpdateResponseFunctionCallArgumentsDelta(String, String, String, Int32, String, String)

Returned when the model-generated function call arguments are updated.

SessionUpdateResponseFunctionCallArgumentsDone(String, String, String, Int32, String, String, String)

Returned when the model-generated function call arguments are done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.

SessionUpdateResponseOutputItemAdded(String, String, Int32, SessionResponseItem)

Returned when a new Item is created during Response generation.

SessionUpdateResponseOutputItemDone(String, String, Int32, SessionResponseItem)

Returned when an Item is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.

SessionUpdateResponseTextDelta(String, String, String, Int32, Int32, String)

Returned when the text value of a "text" content part is updated.

SessionUpdateResponseTextDone(String, String, String, Int32, Int32, String)

Returned when the text value of a "text" content part is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.

SessionUpdateSessionCreated(String, VoiceLiveSessionResponse)

Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration.

SessionUpdateSessionUpdated(String, VoiceLiveSessionResponse)

Returned when a session is updated with a session.update event, unless there is an error.

SystemMessageItem(String, IEnumerable<MessageContentPart>, Nullable<ItemParamStatus>)

A system message item within a conversation.

TurnDetection(String)

Top-level union for turn detection configuration. Please note this is the abstract base class. The derived classes available for instantiation are: ServerVadTurnDetection, AzureSemanticVadTurnDetection, AzureSemanticVadTurnDetectionEn, and AzureSemanticVadTurnDetectionMultilingual.

UserMessageItem(String, IEnumerable<MessageContentPart>, Nullable<ItemParamStatus>)

A user message item within a conversation.

VideoBackground(String, String)

Defines a video background, either a solid color or an image URL (mutually exclusive).

VideoCrop(IEnumerable<Int32>, IEnumerable<Int32>)

Defines a video crop rectangle using top-left and bottom-right coordinates.

VideoParams(Nullable<Int32>, String, VideoCrop, VideoResolution, VideoBackground, Nullable<Int32>)

Video streaming parameters for avatar.

VideoResolution(Int32, Int32)

Resolution of the video feed in pixels.

VoiceLiveContentPart(String)

Base for any content part; discriminated by type. Please note this is the abstract base class. The derived classes available for instantiation are: RequestTextContentPart, RequestAudioContentPart, ResponseTextContentPart, and ResponseAudioContentPart.

VoiceLiveErrorDetails(String, String, String, String, String)

Error object returned in case of API failure.

VoiceLiveFunctionDefinition(String, String, BinaryData)

The definition of a function tool as used by the voicelive endpoint.

VoiceLiveSessionOptions(String, IEnumerable<InteractionModality>, AnimationOptions, VoiceProvider, String, Nullable<Int32>, Nullable<InputAudioFormat>, Nullable<OutputAudioFormat>, AudioNoiseReduction, AudioEchoCancellation, AvatarConfiguration, AudioInputTranscriptionOptions, IEnumerable<AudioTimestampType>, IEnumerable<VoiceLiveToolDefinition>, ToolChoiceOption, Nullable<Single>, MaxResponseOutputTokensOption, BinaryData)

Base for session configuration shared between request and response.

VoiceLiveToolDefinition(String)

The base representation of a voicelive tool definition. Please note this is the abstract base class. The derived classes available for instantiation are: VoiceLiveFunctionDefinition.

Applies to