Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The Azure VoiceLive client library for Java enables real-time, bidirectional voice conversations with AI assistants. Built on WebSocket technology, it provides low-latency audio streaming with support for voice activity detection, interruption handling, and flexible authentication.
Use the Azure VoiceLive client library for Java to:
- Create real-time voice conversations with AI assistants
- Stream audio input from microphone with automatic voice activity detection
- Receive and play audio responses with interruption support
- Handle conversational flow with turn detection and session management
- Authenticate using API keys or Azure AD (token credentials)
[Source code][source_code] | API reference documentation | Product documentation | [Samples][samples_folder]
Getting started
Prerequisites
- Java Development Kit (JDK) with version 8 or above
- Azure Subscription
- Azure VoiceLive resource with endpoint and API key
Adding the package to your product
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-voicelive</artifactId>
<version>1.0.0-beta.2</version>
</dependency>
Authentication
To interact with the Azure VoiceLive service, you'll need to create an instance of the [VoiceLiveAsyncClient][voicelive_client_async] using [VoiceLiveClientBuilder][voicelive_client_builder]. The client supports two authentication methods:
Authenticate with API Key
Get your Azure VoiceLive API key from the Azure Portal:
VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()
.endpoint("https://your-resource.openai.azure.com/")
.credential(new AzureKeyCredential("your-api-key"))
.buildAsyncClient();
Authenticate with Azure AD (Token Credential)
Azure SDK for Java supports Azure Identity, making it easy to use Microsoft identity platform for authentication.
First, add the Azure Identity package:
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-identity</artifactId>
<version>1.18.1</version>
</dependency>
Then create a client with DefaultAzureCredential:
TokenCredential credential = new DefaultAzureCredentialBuilder().build();
VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()
.endpoint("https://your-resource.openai.azure.com/")
.credential(credential)
.buildAsyncClient();
For development and testing, you can use Azure CLI credentials:
TokenCredential credential = new AzureCliCredentialBuilder().build();
VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()
.endpoint("https://your-resource.openai.azure.com/")
.credential(credential)
.buildAsyncClient();
Key concepts
VoiceLiveAsyncClient
The main entry point for interacting with the Azure VoiceLive service. Use the VoiceLiveClientBuilder to construct a client instance. The client provides methods to start sessions and manage real-time voice conversations.
VoiceLiveSessionAsyncClient
Represents an active WebSocket connection for bidirectional streaming communication. This async client supports:
- Sending audio input streams via
sendInputAudio() - Sending command events via
sendEvent() - Receiving server events as a reactive stream (Flux) via
receiveEvents() - Graceful shutdown with
close()orcloseAsync()
VoiceLiveSessionOptions
Configuration options for customizing session behavior:
- Model selection: Specify the AI model (e.g., "gpt-4o-realtime-preview")
- Voice settings: Choose from OpenAI voices (Alloy, Ash, Ballad, Coral, Echo, Sage, Shimmer, Verse) or Azure voices
- Modalities: Configure text and/or audio interaction modes
- Turn detection: Server-side voice activity detection with configurable thresholds
- Audio formats: PCM16 input/output with configurable sample rates
- Audio enhancements: Noise reduction and echo cancellation
- Transcription: Optional input audio transcription using Whisper models
Audio Requirements
The VoiceLive service uses specific audio formats:
- Sample Rate: 24kHz (24000 Hz)
- Bit Depth: 16-bit PCM
- Channels: Mono (1 channel)
- Format: Signed PCM, little-endian
Examples
The following sections provide code snippets for common scenarios:
- Simple voice assistant
- Configure session options
- Send audio input
- Handle event types
- Voice configuration
- Complete voice assistant with microphone
Focused Sample Files
For easier learning, explore these focused samples in order:
BasicVoiceConversationSample.java - Start here to learn the basics
- Minimal setup and session management
- Client creation and configuration
- Basic event handling
AuthenticationMethodsSample.java - Learn authentication options
- API Key authentication (default)
- Token Credential authentication with DefaultAzureCredential
MicrophoneInputSample.java - Add audio input capability
- Real-time microphone audio capture
- Audio format configuration (24kHz, 16-bit PCM, mono)
- Streaming audio to the service
- Speech detection events
AudioPlaybackSample.java - Add audio output capability
- Receiving audio responses from service
- Audio playback to speakers
- Response completion tracking
VoiceAssistantSample.java - Complete production-ready implementation
- Full bidirectional audio streaming
- Voice Activity Detection (VAD) with interruption handling
- Audio transcription with Whisper
- Noise reduction and echo cancellation
- Multi-threaded audio processing
Note: To run audio samples (AudioPlaybackSample, MicrophoneInputSample, VoiceAssistantSample):
mvn exec:java -Dexec.mainClass=com.azure.ai.voicelive.AudioPlaybackSample -Dexec.classpathScope=testThese samples use
javax.sound.sampledfor audio I/O.
Simple voice assistant
Create a basic voice assistant session:
// Start session with default options
client.startSession("gpt-4o-realtime-preview")
.flatMap(session -> {
System.out.println("Session started");
// Subscribe to receive events
session.receiveEvents()
.subscribe(
event -> System.out.println("Event: " + event.getType()),
error -> System.err.println("Error: " + error.getMessage())
);
return Mono.just(session);
})
.block();
Configure session options
Customize the session with specific options:
// Configure server-side voice activity detection
ServerVadTurnDetection turnDetection = new ServerVadTurnDetection()
.setThreshold(0.5) // Sensitivity threshold (0.0-1.0)
.setPrefixPaddingMs(300) // Audio before speech detection
.setSilenceDurationMs(500) // Silence to end turn
.setInterruptResponse(true) // Allow user interruptions
.setAutoTruncate(true) // Auto-truncate on interruption
.setCreateResponse(true); // Auto-create response after turn
// Configure input audio transcription
AudioInputTranscriptionOptions transcription = new AudioInputTranscriptionOptions(
AudioInputTranscriptionOptionsModel.WHISPER_1);
// Create session options
VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
.setInstructions("You are a helpful AI voice assistant. Respond naturally and conversationally.")
.setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY)))
.setModalities(Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO))
.setInputAudioFormat(InputAudioFormat.PCM16)
.setOutputAudioFormat(OutputAudioFormat.PCM16)
.setInputAudioSamplingRate(24000)
.setInputAudioNoiseReduction(new AudioNoiseReduction(AudioNoiseReductionType.NEAR_FIELD))
.setInputAudioEchoCancellation(new AudioEchoCancellation())
.setInputAudioTranscription(transcription)
.setTurnDetection(turnDetection);
// Start session with options
client.startSession("gpt-4o-realtime-preview")
.flatMap(session -> {
// Send session configuration
ClientEventSessionUpdate updateEvent = new ClientEventSessionUpdate(options);
return session.sendEvent(updateEvent).then(Mono.just(session));
})
.subscribe(
session -> System.out.println("Session configured"),
error -> System.err.println("Error: " + error.getMessage())
);
Send audio input
Stream audio data to the service:
// Send audio chunk
byte[] audioData = readAudioChunk(); // Your audio data in PCM16 format
session.sendInputAudio(BinaryData.fromBytes(audioData))
.subscribe();
// Send audio from file
try {
Path audioFile = Paths.get("audio.wav");
byte[] fileData = Files.readAllBytes(audioFile);
session.sendInputAudio(BinaryData.fromBytes(fileData))
.subscribe();
} catch (IOException e) {
System.err.println("Error reading audio file: " + e.getMessage());
}
Handle event types
Process different event types for complete conversation flow:
session.receiveEvents()
.subscribe(event -> {
ServerEventType eventType = event.getType();
if (ServerEventType.SESSION_CREATED.equals(eventType)) {
System.out.println("✓ Session created - ready to start");
} else if (ServerEventType.SESSION_UPDATED.equals(eventType)) {
System.out.println("✓ Session configured - starting conversation");
if (event instanceof SessionUpdateSessionUpdated) {
SessionUpdateSessionUpdated updated = (SessionUpdateSessionUpdated) event;
// Access session configuration details
String json = BinaryData.fromObject(updated).toString();
System.out.println("Config: " + json);
}
} else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED.equals(eventType)) {
System.out.println("🎤 User started speaking");
} else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED.equals(eventType)) {
System.out.println("🤔 User stopped speaking - processing...");
} else if (ServerEventType.RESPONSE_AUDIO_DELTA.equals(eventType)) {
// Play audio response
if (event instanceof SessionUpdateResponseAudioDelta) {
SessionUpdateResponseAudioDelta audioEvent =
(SessionUpdateResponseAudioDelta) event;
playAudioChunk(audioEvent.getDelta());
}
} else if (ServerEventType.RESPONSE_AUDIO_DONE.equals(eventType)) {
System.out.println("🔊 Assistant finished speaking");
} else if (ServerEventType.RESPONSE_DONE.equals(eventType)) {
System.out.println("✅ Response complete - ready for next input");
} else if (ServerEventType.ERROR.equals(eventType)) {
if (event instanceof SessionUpdateError) {
SessionUpdateError errorEvent = (SessionUpdateError) event;
System.err.println("❌ Error: "
+ errorEvent.getError().getMessage());
}
}
});
Voice configuration
The SDK supports multiple voice providers:
OpenAI Voices
// Use OpenAIVoiceName enum for available voices (ALLOY, ASH, BALLAD, CORAL, ECHO, SAGE, SHIMMER, VERSE)
VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
.setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY)));
Azure Voices
Azure voices include AzureStandardVoice, AzureCustomVoice, and AzurePersonalVoice (all extend AzureVoice):
// Azure Standard Voice - use any Azure TTS voice name
// See: https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts
VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
.setVoice(BinaryData.fromObject(new AzureStandardVoice("en-US-JennyNeural")));
// Azure Custom Voice - requires custom voice name and endpoint ID
VoiceLiveSessionOptions options2 = new VoiceLiveSessionOptions()
.setVoice(BinaryData.fromObject(new AzureCustomVoice("myCustomVoice", "myEndpointId")));
// Azure Personal Voice - requires speaker profile ID and model
// Models: DRAGON_LATEST_NEURAL, PHOENIX_LATEST_NEURAL, PHOENIX_V2NEURAL
VoiceLiveSessionOptions options3 = new VoiceLiveSessionOptions()
.setVoice(BinaryData.fromObject(
new AzurePersonalVoice("speakerProfileId", PersonalVoiceModels.PHOENIX_LATEST_NEURAL)));
Complete voice assistant with microphone
A full example demonstrating real-time microphone input and audio playback:
String endpoint = System.getenv("AZURE_VOICELIVE_ENDPOINT");
String apiKey = System.getenv("AZURE_VOICELIVE_API_KEY");
// Create the VoiceLive client
VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()
.endpoint(endpoint)
.credential(new AzureKeyCredential(apiKey))
.buildAsyncClient();
// Configure session options for voice conversation
ServerVadTurnDetection turnDetection = new ServerVadTurnDetection()
.setThreshold(0.5)
.setPrefixPaddingMs(300)
.setSilenceDurationMs(500)
.setInterruptResponse(true)
.setAutoTruncate(true)
.setCreateResponse(true);
AudioInputTranscriptionOptions transcriptionOptions = new AudioInputTranscriptionOptions(
AudioInputTranscriptionOptionsModel.WHISPER_1);
VoiceLiveSessionOptions sessionOptions = new VoiceLiveSessionOptions()
.setInstructions("You are a helpful AI voice assistant.")
.setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY)))
.setModalities(Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO))
.setInputAudioFormat(InputAudioFormat.PCM16)
.setOutputAudioFormat(OutputAudioFormat.PCM16)
.setInputAudioSamplingRate(24000)
.setInputAudioNoiseReduction(new AudioNoiseReduction(AudioNoiseReductionType.NEAR_FIELD))
.setInputAudioEchoCancellation(new AudioEchoCancellation())
.setInputAudioTranscription(transcriptionOptions)
.setTurnDetection(turnDetection);
// Start session and handle events
client.startSession("gpt-4o-realtime-preview")
.flatMap(session -> {
// Subscribe to receive server events
session.receiveEvents()
.subscribe(
event -> handleEvent(event, session),
error -> System.err.println("Error: " + error.getMessage())
);
// Send session configuration
ClientEventSessionUpdate updateEvent = new ClientEventSessionUpdate(sessionOptions);
return session.sendEvent(updateEvent).then(Mono.just(session));
})
.block();
For complete, runnable implementations, see the Focused Sample Files section above.
Troubleshooting
Enable client logging
You can set the AZURE_LOG_LEVEL environment variable to view logging statements made in the client library. For example, setting AZURE_LOG_LEVEL=2 would show all informational, warning, and error log messages. The log levels can be found here: [log levels][log_levels].
Common issues
Audio system not available
Ensure your system has a working microphone and speakers/headphones. The VoiceLive service requires:
- Microphone: For capturing audio input (24kHz, 16-bit PCM, mono)
- Speakers: For playing audio responses (24kHz, 16-bit PCM, mono)
WebSocket connection failures
If you encounter connection issues:
- Verify your endpoint URL is correct
- Check that your API key or token credential is valid
- Ensure your network allows WebSocket connections
- Confirm your Azure VoiceLive resource is properly provisioned
Authentication errors
For API key authentication:
- Verify the
AZURE_VOICELIVE_API_KEYenvironment variable is set correctly - Ensure the API key matches your Azure VoiceLive resource
For token credential authentication:
- Run
az loginbefore using Azure CLI credentials - Verify the credential has appropriate permissions for the VoiceLive resource
- Check that the Azure Identity library is properly configured
Default HTTP Client
All client libraries by default use the Netty HTTP client. Adding the above dependency will automatically configure the client library to use the Netty HTTP client. For more information on HTTP client configuration, see the [HTTP clients wiki][http_clients_wiki].
Default SSL library
All client libraries, by default, use the Tomcat-native Boring SSL library to enable native-level performance for SSL operations. The Boring SSL library is an uber jar containing native libraries for Linux / macOS / Windows, and provides better performance compared to the default SSL implementation within the JDK. For more information, including how to reduce the dependency size, refer to the [performance tuning][performance_tuning] section of the wiki.
Next steps
Sample files
All sample files are located in the src/samples/java/com/azure/ai/voicelive/ directory. See the Focused Sample Files section for detailed descriptions and running instructions.
Additional documentation
- Azure VoiceLive product documentation
- [Azure SDK for Java][azure_sdk_java]
Contributing
For details on contributing to this repository, see the [contributing guide][contributing].
- Fork it
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create new Pull Request
Azure SDK for Java