Hi, we have a deployment of GPT-4o in Azure that's acting strangely in comparison to GPT-4 Turbo.
We access this deployment via either Semantic Kernel or via the Azure AI SDK using dotnet, depending on the use case, but both frameworks obviously invoke the same API calls for chat completion.
Regardless, when prompting these models via the chat completion API there is a default response timeout of 2 minutes.
This timeout can be overridden via the OpenAI client options, and has always worked with GPT 3.5, 3.5 Turbo, 4 and 4 Turbo.
Since moving to GPT-4o, overriding this timeout no longer appears to have any effect, and should your completion generation take over 2 minutes, the connection from the host will be cut at exactly 2 minutes.
This behaviour occurs regardless of the framework being used and can be replicated by calling the chat completion API directly (all API versions appear to have this issue) or via the Chat Playground in Azure OpenAI Studio.
Is this a bug / issue with Azure's implementation of GPT-4o? Or is this a bug / issue on OpenAI's side?
Kindly assist.