Content-Length: 279590 | pFad | http://github.com/vitessio/vitess/issues/17867

B7 Bug Report: Query planner breaks join with derived table containing a subquery with `IN`/`NOT IN` · Issue #17867 · vitessio/vitess · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Query planner breaks join with derived table containing a subquery with IN/NOT IN #17867

Open
arthurschreiber opened this issue Feb 25, 2025 · 4 comments · May be fixed by #17955 or #17963
Open

Comments

@arthurschreiber
Copy link
Contributor

arthurschreiber commented Feb 25, 2025

Overview of the Issue

We have run into a query that gets incorrectly broken up by the Query Planner instead of fully being pushed down to MySQL.

Here's a simplified version of that query (using the vschema located at go/vt/vtgate/planbuilder/testdata/vschemas/schema.json):

SELECT m1.id
FROM music as m1
JOIN (
  SELECT max(id) as id
  FROM music
  WHERE music.user_id = 1234 AND music.foobar NOT IN (
    SELECT foobar FROM user_extra WHERE user_extra.user_id = 1234
  )
) as m2 ON m1.id = m2.id

The derived table m2 has enough information to be routed to a specific shard (user_extra and music have conditions on user_id, which is a hash vindex). The join between m1 and m2 happens on a shared lookup vindex.

The above mentioned conditions should be enough to have the query fully merged by the query planner and pushed down completely to MySQL. Instead, the query is broken up. Here's the query plan (taken on main):

{
  "QueryType": "SELECT",
  "Original": "SELECT m1.id FROM music as m1 JOIN (SELECT max(id) as id FROM music WHERE music.user_id = 1234 AND music.foobar NOT IN (SELECT foobar FROM user_extra WHERE user_extra.user_id = 1234)) as m2 ON m1.id = m2.id",
  "Instructions": {
    "OperatorType": "Join",
    "Variant": "Join",
    "JoinColumnIndexes": "R:0",
    "JoinVars": {
      "m2_id": 0
    },
    "TableName": "music_music",
    "Inputs": [
      {
        "OperatorType": "Route",
        "Variant": "EqualUnique",
        "Keyspace": {
          "Name": "user",
          "Sharded": true
        },
        "FieldQuery": "select m2.id from (select max(id) as id from music where 1 != 1) as m2 where 1 != 1",
        "Query": "select m2.id from (select max(id) as id from music where music.user_id = 1234 and music.foobar not in (select foobar from user_extra where user_extra.user_id = 1234)) as m2",
        "Table": "music",
        "Values": [
          "1234"
        ],
        "Vindex": "user_index"
      },
      {
        "OperatorType": "VindexLookup",
        "Variant": "EqualUnique",
        "Keyspace": {
          "Name": "user",
          "Sharded": true
        },
        "Values": [
          ":m2_id"
        ],
        "Vindex": "music_user_map",
        "Inputs": [
          {
            "OperatorType": "Route",
            "Variant": "IN",
            "Keyspace": {
              "Name": "user",
              "Sharded": true
            },
            "FieldQuery": "select `name`, keyspace_id from name_user_vdx where 1 != 1",
            "Query": "select `name`, keyspace_id from name_user_vdx where `name` in ::__vals",
            "Table": "name_user_vdx",
            "Values": [
              "::name"
            ],
            "Vindex": "user_index"
          },
          {
            "OperatorType": "Route",
            "Variant": "ByDestination",
            "Keyspace": {
              "Name": "user",
              "Sharded": true
            },
            "FieldQuery": "select m1.id from music as m1 where 1 != 1",
            "Query": "select m1.id from music as m1 where m1.id = :m2_id",
            "Table": "music"
          }
        ]
      }
    ]
  },
  "TablesUsed": [
    "user.music",
    "user.user_extra"
  ]
}

Variations of this query get merged correctly. For example, a derived table with a nested join:

SELECT m1.id
FROM music as m1
JOIN (
  SELECT max(id) as id
  FROM music
  INNER JOIN user_extra ue ON music.user_id = ue.user_id AND music.foobar = ue.foobar WHERE music.user_id = 1234
) as m2 ON m1.id = m2.id
{
  "QueryType": "SELECT",
  "Original": "SELECT m1.id FROM music as m1 JOIN (SELECT max(id) as id FROM music INNER JOIN user_extra ue ON music.user_id = ue.user_id AND music.foobar = ue.foobar WHERE music.user_id = 1234) as m2 ON m1.id = m2.id",
  "Instructions": {
    "OperatorType": "Route",
    "Variant": "EqualUnique",
    "Keyspace": {
      "Name": "user",
      "Sharded": true
    },
    "FieldQuery": "select m1.id from (select max(id) as id from music, user_extra as ue where 1 != 1) as m2, music as m1 where 1 != 1",
    "Query": "select m1.id from (select max(id) as id from music, user_extra as ue where music.user_id = 1234 and music.user_id = ue.user_id and music.foobar = ue.foobar) as m2, music as m1 where m1.id = m2.id",
    "Table": "music, user_extra",
    "Values": [
      "1234"
    ],
    "Vindex": "user_index"
  },
  "TablesUsed": [
    "user.music",
    "user.user_extra"
  ]
}

The query for the derived table gets merged correctly:

SELECT max(id) as id
FROM music
WHERE music.user_id = 1234 AND music.foobar NOT IN (
  SELECT foobar FROM user_extra WHERE user_extra.user_id = 1234
)
{
  "QueryType": "SELECT",
  "Original": "SELECT max(id) as id FROM music WHERE music.user_id = 1234 AND music.foobar NOT IN (SELECT foobar FROM user_extra WHERE user_extra.user_id = 1234)",
  "Instructions": {
    "OperatorType": "Route",
    "Variant": "EqualUnique",
    "Keyspace": {
      "Name": "user",
      "Sharded": true
    },
    "FieldQuery": "select max(id) as id from music where 1 != 1",
    "Query": "select max(id) as id from music where music.user_id = 1234 and music.foobar not in (select foobar from user_extra where user_extra.user_id = 1234)",
    "Table": "music",
    "Values": [
      "1234"
    ],
    "Vindex": "user_index"
  },
  "TablesUsed": [
    "user.music",
    "user.user_extra"
  ]
}

And a join with a derived table without a subquery works fine too:

SELECT m1.id
FROM music as m1
JOIN (
  SELECT max(id) as id FROM music WHERE music.user_id = 1234
) as m2 ON m1.id = m2.id
{
  "QueryType": "SELECT",
  "Original": "SELECT m1.id FROM music as m1 JOIN (SELECT max(id) as id FROM music WHERE music.user_id = 1234) as m2 ON m1.id = m2.id",
  "Instructions": {
    "OperatorType": "Route",
    "Variant": "EqualUnique",
    "Keyspace": {
      "Name": "user",
      "Sharded": true
    },
    "FieldQuery": "select m1.id from (select max(id) as id from music where 1 != 1) as m2, music as m1 where 1 != 1",
    "Query": "select m1.id from (select max(id) as id from music where music.user_id = 1234) as m2, music as m1 where m1.id = m2.id",
    "Table": "music",
    "Values": [
      "1234"
    ],
    "Vindex": "user_index"
  },
  "TablesUsed": [
    "user.music"
  ]
}

Reproduction Steps

N/A

Binary Version

v19 - main

Operating System and Environment details

N/A

Log Fragments

No response

@arthurschreiber arthurschreiber added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged Component: Query Serving and removed Needs Triage This issue needs to be correctly labelled and triaged labels Feb 25, 2025
@kairveeehh
Copy link

hii , would like to work on this issue , kinfly assign

@GuptaManan100
Copy link
Member

Hello @kairveeehh, thanks for your interest in working on this issue! After internal discussions, we believe this might be a bit complex to tackle without more experience with Vitess. However, if you’re confident that you understand the fix and would like to take it on, you’re more than welcome to give it a try. We typically don’t assign issues to individuals, but feel free to open a PR once you have a solution. Just ensure it’s well-tested and includes the necessary unit and end-to-end tests!

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Mar 5, 2025

Joining @GuptaManan100's sentiment that this particular task is not a good fit for a first contribution. Also, we want to see whether @arthurschreiber has intentions of working on that himself. @arthurschreiber ?

@kairveeehh
Copy link

Hello @kairveeehh, thanks for your interest in working on this issue! After internal discussions, we believe this might be a bit complex to tackle without more experience with Vitess. However, if you’re confident that you understand the fix and would like to take it on, you’re more than welcome to give it a try. We typically don’t assign issues to individuals, but feel free to open a PR once you have a solution. Just ensure it’s well-tested and includes the necessary unit and end-to-end tests!

Sure , would understand it further and inform , also I look forward to contribute to the idea of vitess under GSoC'25 the chatbot based over RAG , had texted over slack but received no response , should I directly go to proposal?

@gopoto gopoto linked a pull request Mar 14, 2025 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/vitessio/vitess/issues/17867

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy