[DON'T MERGE] Proof of Concept: ES|QL approximate query execution #131828

jan-elastic · 2025-07-24T13:09:37Z

Proof of concept for approximate query execution

This is for gathering early feedback; not for merging!

This is targeting queries of the form

FROM data
  | commands_swappable_with_sample
  | STATS aggs
  | more_commands

Approximating rewrites it to

FROM data
  | SAMPLE probability
  | commands_swappable_with_sample
  | STATS sample_corrected_aggs
  | more_commands

The sample probability is such that the approximated results are based on ~1000 docs. It's determined via the total result count:

FROM data
  | commands_swappable_with_sample
  | STATS COUNT(*)

You can use this as follows

POST _query
{  
  "query": """
    FROM kibana_sample_data_ecommerce
     | STATS count=COUNT() BY CATEGORIZE(category)
     | SORT count DESC
  """,
  "approximate": true
}

With "approximate": false, the (correct) results are:

     count     |CATEGORIZE(category)
---------------+--------------------
3927           |.*?Clothing.*?      
2080           |.*?Shoes.*?         
1402           |.*?Accessories.*?

(based on "documents_found": 4675)

With "approximate": true, the (approxmiate) results are like:

     count     |CATEGORIZE(category)
---------------+--------------------
3791           |.*?Clothing.*?      
2001           |.*?Shoes.*?         
1533           |.*?Accessories.*?

(based on "documents_found": 990)

ivancea

Just a shallow check. To me, it makes sense. I would wait for somebody else to have another opinion though, in case this extra query could lead to something bad somewhere

ivancea · 2025-07-24T14:14:46Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/approximate/Approximate.java

+     *
+     * To be so, the plan must contain at least one STATS function, and all
+     * functions between the source and the leftmost STATS function must be
+     * swappable with STATS.


You mean with SAMPLE here, right?

Suggested change

* swappable with STATS.

* swappable with SAMPLE.

yes, this should be SAMPLE

ivancea · 2025-07-24T14:25:53Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/approximate/Approximate.java

+     * off at the leftmost STATS function, followed by "| STATS COUNT(*)".
+     * This value can be used to pick a good sample probability.
+     */
+    public LogicalPlan countPlan() {


This extra query is probably my major "concern". It looks ok, but it's still going to execute evals and wheres, which could end up executing a full query anyway (?). It looks a bit "dangerous".

As an idea, I wonder if we could use some kind of Lucene statistics for this. I don't know if we have them though, or if what we have is enough. Even if they were just approximates, they could let us avoid this extra query, maybe. This would be another block of work though

I get your concern. That's exactly why I wanted some early feedback.

The extra query is pretty similar to the extra query of the inline join subplan though.

In the case of

FROM data | STATS COUNT()

I guess we can get the count directly from Lucene.

But for a more complicated

FROM data | WHERE my_function(x) < 1 | STATS COUNT()

that's obv not possible.

We can use sampling again though to get an approximate count, which is good enough for setting the probability.

ivancea · 2025-07-24T15:23:02Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/approximate/Approximate.java

+    public LogicalPlan countPlan() {
+        Holder<Boolean> encounteredStats = new Holder<>(false);
+        LogicalPlan countPlan = logicalPlan.transformUp(plan -> {
+            if (plan instanceof LeafPlan) {


A Join may have 2 leaf plans if I'm not wrong, which could lead to this detecting the wrong Stats (?)

I didn't give too too much thought yet on JOIN, FORK, INLINESTATS, etc.

First wanted to make something work end to end.

jan-elastic marked this pull request as draft July 24, 2025 13:09

elasticsearchmachine added the v9.2.0 label Jul 24, 2025

jan-elastic force-pushed the esql-approximate branch from 4345891 to ee5caf5 Compare July 24, 2025 13:22

jan-elastic changed the title ~~[Proof of Concept] ES|QL approximate query execution~~ [DON'T MERGE] Proof of Concept: ES|QL approximate query execution Jul 24, 2025

ivancea reviewed Jul 24, 2025

View reviewed changes

jan-elastic added 2 commits July 25, 2025 12:04

Approximate ESQL stats execution using 1000 documents

9879405

refactor a bit

ceaa209

jan-elastic force-pushed the esql-approximate branch from ee5caf5 to ceaa209 Compare July 25, 2025 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DON'T MERGE] Proof of Concept: ES|QL approximate query execution #131828

[DON'T MERGE] Proof of Concept: ES|QL approximate query execution #131828

Uh oh!

jan-elastic commented Jul 24, 2025 •

edited

Loading

Uh oh!

ivancea left a comment

Uh oh!

ivancea Jul 24, 2025

Uh oh!

jan-elastic Jul 25, 2025

Uh oh!

ivancea Jul 24, 2025

Uh oh!

jan-elastic Jul 25, 2025 •

edited

Loading

Uh oh!

jan-elastic Jul 25, 2025

Uh oh!

ivancea Jul 24, 2025

Uh oh!

jan-elastic Jul 25, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

[DON'T MERGE] Proof of Concept: ES|QL approximate query execution #131828

Are you sure you want to change the base?

[DON'T MERGE] Proof of Concept: ES|QL approximate query execution #131828

Uh oh!

Conversation

jan-elastic commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proof of concept for approximate query execution

Uh oh!

ivancea left a comment

Choose a reason for hiding this comment

Uh oh!

ivancea Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

jan-elastic Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

ivancea Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

jan-elastic Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan-elastic Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

ivancea Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

jan-elastic Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

jan-elastic commented Jul 24, 2025 •

edited

Loading

jan-elastic Jul 25, 2025 •

edited

Loading