Azure Data Explorer query with joins + sort extremely slow (40s runtime, 2GB peak memory)

Question

Azure Data Explorer query with joins + sort extremely slow (40s runtime, 2GB peak memory)

User1279836 20

Hello,

I am currently evaluating Azure Data Explorer (ADX) for my application and facing severe performance issues. My workload requires retrieving data quickly (ideally within 1–2 seconds) with queries that involve joins and sorting over large datasets. (potentially over 20-30 mln rows)

Here is a example of the query pattern that causes problems:

let pageSize = 10;
let pageNumber = 2;
let startIndex = (pageNumber - 1) * pageSize + 1;
let endIndex = pageNumber * pageSize;

let dataset1 = TableA
| project Key1, ColX, ColY;

TableB
| join kind = leftouter (dataset1) on Key1
| join kind = leftouter (TableC) on Key2
| sort by Key2 desc
| serialize RowNum = row_number()
| where RowNum between (startIndex .. endIndex)

Problem:

The query takes ~40 seconds to complete. (even with using materialized views)
During execution, memory peak per node exceeds 2 GB.
Our goal is to make this run in ~1–2 seconds, as this query will back user-facing APIs. (Although when we remove sort from this query it improves response time greatly)

Questions:

Is ADX suitable for this type of workload (multiple joins + global sort + pagination on large datasets)?
Is there a recommended way to implement efficient pagination without relying on serialize + row_number()?
Can ADX be used in application scenario where we need quick responses for user facing actions, or should it be used more as a storage for analytics data?

I am under time pressure to decide whether to scale my current relational database or switch fully to ADX. I would greatly appreciate guidance from the product team or experienced ADX engineers.

Thank you in advance!

Answer accepted by question author

0 additional answers

Your answer

Answer 1

Hi ,

Thanks for reaching out to Microsoft Q&A.

your query might be slow because it forces azure Data Explorer to perform a global sort across millions of rows, followed by serialization and row numbering for offset based pagination. This combination is very expensive in a distributed, columnar engine and explains both the high latency and memory usage. ADX can deliver sub second responses for user facing APIs, but only if the query patterns and data model are optimized for how the engine works. Instead of using offset pagination, you should switch to keyset (cursor-based) pagination, which avoids full scans and resorts on every request. Denormalizing freq joined attributes into your main fact table, either during ingestion or with a materialized view, also removes the overhead of joining at query time. Filtering early and projecting only the necessary columns will help reduce data shuffling and memory pressure. If joins with smaller tables are unavoidable hinting for broadcast joins/using lookups instead of full shuffles can improve performance somewhat.

Pls note that ADX is suitable for real time, low latency scenarios when queries are selective, aggregations are pushed down early, and pagination is designed to return only the top N rows efficiently. However if your use case demands deep offset paging with a global sort over tens of millions of rows, ADX or any columnar analytics store will struggle to meet a 1 - 2 second SLA. In such cases, you need to redesign the access pattern or use a different engine optimized for random access and offset pagination. For application facing APIs, rethink the experience to rely on incremental cursors and pre enriched data rather than complex joins and expensive sorts at query time.

Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

Share via

Azure Data Explorer query with joins + sort extremely slow (40s runtime, 2GB peak memory)

0 additional answers

Your answer