Deploy a multimodal model

3 minutes

To handle prompts that include audio, you need to deploy a multimodal generative AI model - in other words, a model that supports not only text-based input, but audio-based input as well. Multimodal models available in Microsoft Foundry include (among others):

Microsoft Phi-4-multimodal-instruct
OpenAI gpt-4o
OpenAI gpt-4o-mini

Tip

To learn more about available models in Microsoft Foundry, see the Model catalog and collections in Microsoft Foundry portal article in the Microsoft Foundry documentation.

Testing multimodal models with audio-based prompts

After deploying a multimodal model, you can test it in the chat playground in Microsoft Foundry portal. Some models allow you to include audio attachments in the playground, either by uploading a file or recording a message.

Screenshot of the chat playground with an audio-based prompt.

In the chat playground, you can upload a local audio file and add text to the message to elicit a response from a multimodal model.

Feedback

Was this page helpful?