Azure Open AI Token based calls vs provisioned PTU selection open AI model

Question

Azure Open AI Token based calls vs provisioned PTU selection open AI model

Sihle Ndlovu 40

Hi i am trying to analyze about 500 000 low resolution image/frames a day through Azure open AI, when i do the costing this could be very expensive and may deplete my monthly tokens do so, i saw that if i use dedicated PTU using GPT 4.1 language model in can be cheaper if it is reserved for a month or the year, i just wanted understanding on PTU as i have an option to select more than one PTU, how much processing power would a single PTU have, could it do 500 000 frames in a day?

Answer accepted by question author

1 additional answer

Your answer

Answer 1

Anshika Varshney 3,870 Microsoft External Staff Moderator

Hi Sihle Ndlovu,

Thanks for your question!

Azure OpenAI provides two ways to use models Token-based usage and Provisioned Throughput Units (PTUs). The choice depends mainly on how much workload you expect and whether you need guaranteed performance.

Token-based calls This option charges you only for the tokens you use. It runs on shared capacity, so it’s great for testing, small workloads, or cases where usage may change. Just keep in mind that performance can vary depending on overall platform demand.

Provisioned Throughput Units (PTU) PTUs give you dedicated, reserved capacity. This provides more stable performance, lower latency, and predictable throughput. It’s typically used for steady or high-volume production workloads because the capacity is always available to you.

In short:

Use token-based if you want flexibility and only pay for what you use.
Use PTU if you need consistency, performance guarantees, and cost predictability for a stable or high-throughput application.

If you have more details about your workload later, feel free to share. happy to clarify further!

Anshika Varshney 3,870 Reputation points Microsoft External Staff Moderator

2025-11-21T04:51:03.1066667+00:00

Hi Sihle Ndlovu,

Thank you for sharing the feedback upvote. **
**I’ve converted my earlier comment into an answer, could you please take a moment to mark it as Accepted? This helps others in the community with the same question find the solution more easily.
Sihle Ndlovu 40 Reputation points

2025-11-21T06:53:14.5066667+00:00

Hi Anshika, thank you very much for your assistance, i now have a better understanding on this resource

Answer 2

Provisioned Throughput Units (PTUs) are units of model processing capacity that can be used to size deployments in Azure OpenAI to achieve the required throughput for processing tasks like analyzing images or frames. The specific processing power of a single PTU can vary based on the model type and version you are using. For example, different models may have different capacities in terms of how many tokens they can process per minute (TPM).

When considering the analysis of 500,000 low-resolution images/frames a day, you need to estimate the token usage per image/frame and the expected requests per minute. Since generations (like processing images) require more capacity than simple prompts, it's important to calculate the total token usage and compare it against the capacity provided by the PTUs you select.

To determine how many PTUs you might need, you can use historical token usage data or estimations based on your specific workload. Additionally, using the Azure AI Foundry PTU quota calculator can help you size your workload shapes more accurately.

If you have predictable throughput and latency requirements, switching to provisioned throughput can be beneficial as it allows for better cost management and performance optimization. You can reserve PTUs for a month or a year to potentially reduce costs compared to pay-as-you-go token usage.

Ultimately, whether a single PTU can handle 500,000 frames in a day depends on the specific token consumption per frame and the overall throughput requirements of your workload. It may be necessary to select multiple PTUs to meet your needs efficiently.

References:

Share via

Azure Open AI Token based calls vs provisioned PTU selection open AI model

1 additional answer

Your answer