Clarification on Metrics to Use for Scaling Microsoft Fabric Capacity (Real-Time Intelligence - Eventstreams & Event Sources)

Question

Clarification on Metrics to Use for Scaling Microsoft Fabric Capacity (Real-Time Intelligence - Eventstreams & Event Sources)

Anirudh Singh Bharti 20

We are using Microsoft Fabric Real-Time Intelligence, specifically Eventstream and Event Source workloads. We would like official guidance on which metrics and thresholds should be monitored to determine when to scale our Fabric capacity up or down.

Could you please clarify:

What Fabric metrics should be monitored for scaling decisions specifically for:

Eventstream ingestion

  Event processing
  
     Delivery to destinations (KQL DB, Lakehouse, etc.)
     
        Eventhouse workloads
        
        What are the **recommended thresholds** (or patterns) that indicate:
        
           Capacity is becoming constrained
           
              Scaling up is required
              
                 Scaling down is safe
                 
                 For Real-Time Intelligence workloads (Eventstream/Event Sources), are there **specific CU usage, throughput, latency, or burst metrics** that Microsoft recommends tracking?
                 
                 What are the **limits or boundaries of CU bursting**, and at what point can throttling or request rejection occur for Eventstream workloads?
                 
                 Is there **any Microsoft architectural guidance** or best practices document specifically for scaling Real-Time workloads in Fabric?

2 answers

Your answer

Answer 1

To effectively manage scaling decisions for Microsoft Fabric Real-Time Intelligence workloads, particularly for Eventstream and Event Source, you should monitor the following metrics:

Key Metrics for Scaling Decisions:

Input Events: Count the number of event data items pulled from sources. A sudden increase may indicate the need for scaling.
Output Events: Monitor the number of events sent to destinations. A decrease in output relative to input may signal a bottleneck.
Backlogged Input Events: This metric indicates how many events are waiting to be processed. A high number suggests that the system is becoming constrained.
Runtime Errors: Track the total number of errors related to event processing. An increase in errors can indicate performance issues that may require scaling.
Watermark Delay: Monitor the maximum watermark delay across all partitions. A significant delay can indicate that the system is struggling to keep up with incoming events.
Incoming and Outgoing Bytes: Measure the amount of data being processed. High throughput may necessitate scaling up.

Recommended Thresholds:

Capacity Constraints: If backlogged input events consistently exceed a certain threshold (e.g., 10% of total input events), consider scaling up.
Scaling Up: If input events are consistently high and output events are low, or if runtime errors increase, it may be time to scale up.
Scaling Down: If backlogged input events are consistently low (e.g., below 5% of total input events) and performance metrics are stable, scaling down may be safe.

CU Usage and Throttling:

Capacity Units (CUs): Monitor CU usage closely. If usage approaches the limit (e.g., 80-90% of allocated CUs), consider scaling up.
Throttling: Throttling may occur when the system is overwhelmed, typically when CU usage exceeds the allocated capacity. Keep an eye on system alerts for throttling events.

Microsoft Architectural Guidance:

For best practices and architectural guidance, refer to the Microsoft documentation on optimizing capacity and monitoring workloads. This includes strategies for effective scaling and resource management.

References:

Answer 2

Smaran Thoomu 32,520 Microsoft External Staff Moderator

Hi Anirudh Singh Bharti

This question is specific to Microsoft Fabric Real-Time Intelligence, and scaling guidance is different from Azure Stream Analytics. For the most accurate engineering-backed answer, please post this in the official Fabric community:

https://community.fabric.microsoft.com/

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Clarification on Metrics to Use for Scaling Microsoft Fabric Capacity (Real-Time Intelligence - Eventstreams & Event Sources)

2 answers

Key Metrics for Scaling Decisions:

Recommended Thresholds:

CU Usage and Throttling:

Microsoft Architectural Guidance:

Your answer