Hi Sihle Ndlovu,
Thanks for your question!
Azure OpenAI provides two ways to use models Token-based usage and Provisioned Throughput Units (PTUs). The choice depends mainly on how much workload you expect and whether you need guaranteed performance.
Token-based calls This option charges you only for the tokens you use. It runs on shared capacity, so it’s great for testing, small workloads, or cases where usage may change. Just keep in mind that performance can vary depending on overall platform demand.
Provisioned Throughput Units (PTU) PTUs give you dedicated, reserved capacity. This provides more stable performance, lower latency, and predictable throughput. It’s typically used for steady or high-volume production workloads because the capacity is always available to you.
In short:
- Use token-based if you want flexibility and only pay for what you use.
- Use PTU if you need consistency, performance guarantees, and cost predictability for a stable or high-throughput application.
If you have more details about your workload later, feel free to share. happy to clarify further!