Hi @01725609
Since keeping our partition policy disabled when we are not backfilling, how can we be 100% sure that our backfilled data has been completely sharded?
You cannot rely on database-level statistics alone, because they always reflect the ongoing ingestion tail.
To be 100% confident your backfill is completely sharded:
- Apply the partitioning policy with an EffectiveDateTime covering just the backfill window (Partitioning Policy).
- Or tag the backfill extents at ingest using Extent tags.
- Keep those extents hot using a temporary Caching policy.
- Use
.show extents(docs) filtered on your tags or time range. - Once no unpartitioned extents remain for that slice, the backfill is fully partitioned — even if
.show database extents partitioning statistics(docs) still shows a small live tail.
How do we know that our backfilled data has already been partitioned completely, knowing that we also have a constant stream of data coming in, which will also be subject to that partitioning policy?
Even with continuous ingestion, you can decisively confirm backfill completion by isolating and tracking only the backfill slice:
- Isolate the backfill
- Scope partitioning to the backfill window with EffectiveDateTime (Partitioning Policy), or tag backfill extents (Extent tags).
- Keep extents hot
- Partitioning runs only on hot extents. Temporarily adjust the Caching policy so the backfill time range stays hot until processing is done.
- Query backfill extents only
(.show extents docs).show table MyTable extents | where tags has "backfill:2025-08" | project ExtentId, CreatedOn, IsHomogeneous, Tags- Done = no nonhomogeneous extents remain for that backfill slice.
Bottom line: despite the constant stream, you know your backfill is complete once its tagged/time-scoped extents are all homogeneous. The live ingestion tail can be ignored.
We can indeed perfectly follow the progress on a database level using .show database extents partitioning statistics, but it seems that the continuous scraping will also be included all of the time?
Correct — the command:
.show database extents partitioning statistics
(docs)
- Always includes the continuous ingestion tail.
- The
PartitionedRowPercentagewill get close to 100% but never exactly reach it while live data keeps arriving.
How to use it effectively:
- Treat this as a roll-up health indicator for the table/cluster.
- For backfill completion, rely instead on:
(.show extents docs).show table MyTable extents | where tags has "backfill:2025-08"
When no unpartitioned extents remain in the backfill slice, you can consider the backfill fully partitioned — even though .show database extents partitioning statistics continues to show the live scraping tail.
Final Takeaway
- Database-level stats always include the live tail, so they can’t be used as the sole indicator.
- To be 100% sure about backfill completion:
- Scope partitioning with
EffectiveDateTimeor use extent tags. - Keep backfill extents hot until partitioning finishes.
- Verify via
.show extentsthat all backfill extents are homogeneous.
- Scope partitioning with
Once those checks are satisfied, you can safely declare the backfill completely sharded, regardless of the ongoing stream.
If you have any further questions, feel free to respond in this same thread.