Most efficient way to take _n_ rows per group

Pål Kristian Halle 155 Reputation points
2025-11-03T12:11:48.5+00:00

Hello ADX experts.

I have an ADX database with IoT sensor data. My use case is to retrieve a specified number of readings (rows) per MetricId after a certain date. The most natural way to do this seems to be using the partition by operator like this:

Metrics
| where MetricId in (1001, 1011, 1021)
| where Timestamp >= datetime(2024-06-01)
| partition hint.strategy=native by MetricId 
(
    top 2 by Timestamp asc
)

This gives me the result I want, but it's extremely slow:

  • 950 ms when hint.strategy = native
  • 700 ms when hint.strategy = shuffle

If I use the union operator, the response is very fast - only 30 ms:

union
(
	Metrics
    | where MetricId == 1001
    | where Timestamp >= datetime(2024-06-01)
    | top 2 by Timestamp asc
),
(
	Metrics
    | where MetricId == 1011
    | where Timestamp >= datetime(2024-06-01)
    | top 2 by Timestamp asc
),
(
	Metrics
    | where MetricId == 1021
    | where Timestamp >= datetime(2024-06-01)
    | top 2 by Timestamp asc
)

But this becomes cumbersome fast if I want to filter on 100 different metrics.

Are there any alternative, more performant ways to do this that I have overlooked?

This is my table schema:

.create table Metrics (
    MetricId: string,
    Timestamp: datetime,
    Value: real
)

.alter table Metrics policy partitioning ```
{
  "PartitionKeys": [
    {
      "ColumnName": "MetricId",
      "Kind": "Hash",
      "Properties": {
        "Function": "XxHash64",
        "MaxPartitionCount": 128,
        "PartitionAssignmentMode": "Uniform"
      }
    },
    {
      "ColumnName": "Timestamp",
      "Kind": "UniformRange",
      "Properties": {
        "Reference": "2000-01-01T00:00:00",
        "RangeSize": "1.00:00:00",
        "OverrideCreationTime": true
      }
    }
  ],
  "EffectiveDateTime":"2000-01-01T00:00:00"
}```
Azure Data Explorer
Azure Data Explorer
An Azure data analytics service for real-time analysis on large volumes of data streaming from sources including applications, websites, and internet of things devices.
{count} votes

Answer accepted by question author
  1. VRISHABHANATH PATIL 1,820 Reputation points Microsoft External Staff Moderator
    2025-11-05T01:40:29.1533333+00:00

    Hi Pål Kristian Halle,

    Thank you for sharing your feedback. We have looked into the possible alternatives, and here are some steps that may help address the issue:

    You are right about how make_list() works—it doesn’t keep the order unless you explicitly sort the data first. The catch is that adding a sort step before summarizing can slow things down, which explains the performance hit you noticed.

    Here’s why that happens and what you can try instead:

    Why does order by slow things down?

    • By default, summarize in Kusto works on unordered data.
    • When you add order by Timestamp asc, it forces a full sort across all rows before aggregation, which is costly for large datasets (think hundreds of thousands of rows).
    • The union method is faster because it skips the global sort and handles each metric separately.

    Ways to improve performance:

    • Use arg_min / arg_max for first row only
      • Great for single-row retrieval per metric, but not for N rows.
      • Chunked approach with take after filtering
        • If you can tolerate approximate results:

    KQL

    Metrics

    | where MetricId in (1001, 1011, 1021)

    | where Timestamp >= datetime(2024-06-01)

    | partition by MetricId

    | take N

    • Limitation: Doesn’t guarantee strict ordering by timestamp.
    • mv-expand + top inside summarize
      • Instead of global sort, sort within each group:

    KQL

    Metrics

    | where MetricId in (1001, 1011, 1021)

    | where Timestamp >= datetime(2024-06-01)

    | summarize readings = make_list(pack('t', Timestamp, 'v', Value)) by MetricId

    | mv-expand readings

    | order by MetricId, readings.t asc

    • This reduces the cost of sorting across all rows.
    • Union-based approach for strict ordering
      • If metrics list is small and static:

    KQL

    union (

    Metrics | where MetricId == 1001 | where Timestamp >= datetime(2024-06-01) | top N by Timestamp asc,

    Metrics | where MetricId == 1011 | where Timestamp >= datetime(2024-06-01) | top N by Timestamp asc,

    Metrics | where MetricId == 1021 | where Timestamp >= datetime(2024-06-01) | top N by Timestamp asc

    )

    • This avoids global sorts and is often fastest for small metric sets.

    There’s no built-in way for make_list() to keep the order unless you sort the data first. If maintaining order is important, using a union or a partitioned approach is usually the most practical solution.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Vinodh247 40,031 Reputation points MVP Volunteer Moderator
    2025-11-04T01:13:56.3133333+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    Use row_number() with summarize and take instead of partition. It is typically faster and scales better for many MetricIds.

    Example(sourced from web):

    Metrics
    | where MetricId in (1001, 1011, 1021)
    | where Timestamp >= datetime(2024-06-01)
    | extend rn = row_number(Timestamp asc, MetricId)
    | where rn <= 2
    
    
    

    If you have many MetricIds, prefilter data using ingestion_time() or limit by a recent time window. Also ensure the table is well partitioned on MetricId (as in your setup) and update statistics regularly to keep query planner efficient.

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.