Hello Tarik Rashada, Thanks for posting your question on Microsoft Q&A!
You're seeing intermittent event drops between your Stream Analytics job and the output Event Hub-despite metrics showing full input and output event counts at the job level suggesting the issue may lie in how events are batched, serialized, or acknowledged in the output.
A few key details would help narrow this down:
- Are you using capture or custom serialization on the output Event Hub?
- What partition key (if any) are you setting in your Stream Analytics output configuration?
- Are you checking sequence numbers or enqueued time on the output Event Hub directly (e.g., via a consumer like EventProcessorHost or
az eventhubs eventhub receiver) to confirm the count mismatch isn’t due to Function App consumption issues downstream?
One known behavior to consider: Stream Analytics batches output writes to Event Hubs for performance. If the job restarts or there’s a transient error during batch flush (even if not logged as a job failure), some events in the last batch might not be persisted—especially under high burst rates. This aligns with your observation that slower send rates reduce drops.
Also verify if your output Event Hub has sufficient partitions. If all events go to a single partition (e.g., due to a static partition key), you’re limited to 1 MB/s per partition, which could cause throttling or silent drops even if throughput units are high.
For validation, try capturing output directly from the Event Hub (bypassing the Function App) using a simple receiver and compare counts. Microsoft’s guidance on output consistency and delivery guarantees notes that Event Hubs output is "at-least-once," but batching and retries can occasionally lead to gaps under extreme burst loads if not tuned properly.
In Short, to confirm where the loss occurs:
- Bypass your Function App and read directly from the output Event Hub using a simple receiver (like
az eventhubs eventhub receiveor a basic EventProcessor) to rule out downstream consumption issues. - Ensure your test runs long enough (e.g., 15–20 seconds after the last event is sent) to allow Stream Analytics to flush any pending output batches.
- Verify whether you're using a partition key in your Stream Analytics output configuration. If not, consider adding one (even a random or round-robin key) to distribute load across partitions.
Let us know the above details so that we can then pinpoint whether this is a batching, partitioning, or downstream consumption issue.
If this answers your query, do click UpVoteandYes` for was this answer helpful. And, if you have any further query do let us know.
Thanks
Pratyush