Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Inefficient query plan when joining derived table containing a UNION #17163

Open
arthurschreiber opened this issue Nov 6, 2024 · 3 comments

Comments

@arthurschreiber
Copy link
Contributor

Overview of the Issue

On all currently supported Vitess versions (v18 to v21), queries that join a derived table (via a vindex) containing a UNION lead to an inefficient query plan performing a JOIN on the vtgate instead of pushing down the JOIN operation to MySQL.

Reproduction Steps

Here's an example query:

SELECT
  music_id
FROM
  music_extra
WHERE
  music_id IN (
    select
      *
    from
      (
        select music.id from music where music.user_id = 1234 AND music.foo = 'bar'
        union
        select music.id from music where music.user_id = 1234 AND music.foo = 'baz'
      ) as subquery
  )

music_extra.music_id and music.id use the same lookup vindex owned by music.
music.user_id is a hash vindex.

Generates the following plan on v18 - v21:

{
  "QueryType": "SELECT",
  "Original": "SELECT music_id FROM music_extra WHERE music_id IN (select * from (select music.id from music where music.user_id = 1234 AND music.foo = 'bar' union select music.id from music where music.user_id = 1234 AND music.foo = 'baz') as subquery)",
  "Instructions": {
    "OperatorType": "UncorrelatedSubquery",
    "Variant": "PulloutIn",
    "PulloutVars": [
      "__sq_has_values",
      "__sq1"
    ],
    "Inputs": [
      {
        "InputName": "SubQuery",
        "OperatorType": "Route",
        "Variant": "EqualUnique",
        "Keyspace": {
          "Name": "user",
          "Sharded": true
        },
        "FieldQuery": "select id from (select music.id from music where 1 != 1 union select music.id from music where 1 != 1) as subquery where 1 != 1",
        "Query": "select id from (select music.id from music where music.user_id = 1234 and music.foo = 'bar' union select music.id from music where music.user_id = 1234 and music.foo = 'baz') as subquery",
        "Table": "music",
        "Values": [
          "1234"
        ],
        "Vindex": "user_index"
      },
      {
        "InputName": "Outer",
        "OperatorType": "Route",
        "Variant": "IN",
        "Keyspace": {
          "Name": "user",
          "Sharded": true
        },
        "FieldQuery": "select music_id from music_extra where 1 != 1",
        "Query": "select music_id from music_extra where :__sq_has_values and music_id in ::__vals",
        "Table": "music_extra",
        "Values": [
          "::__sq1"
        ],
        "Vindex": "music_user_map"
      }
    ]
  },
  "TablesUsed": [
    "user.music",
    "user.music_extra"
  ]
}

The same issue occurs when joining on the hash vindex directly:

SELECT
  music_id
FROM
  music_extra
WHERE
  user_id IN (
    select
      *
    from
      (
        select music.user_id from music where music.user_id = 1234 AND music.foo = 'bar'
        union
        select music.user_id from music where music.user_id = 1234 AND music.foo = 'baz'
      ) as subquery
  )
{
  "QueryType": "SELECT",
  "Original": "SELECT music_id FROM music_extra WHERE user_id IN (select * from (select music.user_id from music where music.user_id = 1234 AND music.foo = 'bar' union select music.user_id from music where music.user_id = 1234 AND music.foo = 'baz') as subquery)",
  "Instructions": {
    "OperatorType": "UncorrelatedSubquery",
    "Variant": "PulloutIn",
    "PulloutVars": [
      "__sq_has_values",
      "__sq1"
    ],
    "Inputs": [
      {
        "InputName": "SubQuery",
        "OperatorType": "Route",
        "Variant": "EqualUnique",
        "Keyspace": {
          "Name": "user",
          "Sharded": true
        },
        "FieldQuery": "select user_id from (select music.user_id from music where 1 != 1 union select music.user_id from music where 1 != 1) as subquery where 1 != 1",
        "Query": "select user_id from (select music.user_id from music where music.user_id = 1234 and music.foo = 'bar' union select music.user_id from music where music.user_id = 1234 and music.foo = 'baz') as subquery",
        "Table": "music",
        "Values": [
          "1234"
        ],
        "Vindex": "user_index"
      },
      {
        "InputName": "Outer",
        "OperatorType": "Route",
        "Variant": "IN",
        "Keyspace": {
          "Name": "user",
          "Sharded": true
        },
        "FieldQuery": "select music_id from music_extra where 1 != 1",
        "Query": "select music_id from music_extra where :__sq_has_values and user_id in ::__vals",
        "Table": "music_extra",
        "Values": [
          "::__sq1"
        ],
        "Vindex": "user_index"
      }
    ]
  },
  "TablesUsed": [
    "user.music",
    "user.music_extra"
  ]
}

Binary Version

N/A

Operating System and Environment details

N/A

Log Fragments

N/A
@arthurschreiber arthurschreiber added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged Component: Query Serving and removed Needs Triage This issue needs to be correctly labelled and triaged labels Nov 6, 2024
@systay
Copy link
Collaborator

systay commented Nov 7, 2024

@arthurschreiber this was merged into a single route in pre-18 Vitess releases?

@arthurschreiber
Copy link
Contributor Author

No, but it was handled "correctly" by the v3 planner in earlier versions (probably more by luck than on purpose).

@evaccaro
Copy link
Contributor

evaccaro commented Nov 7, 2024

We found an additional similar example of this behavior, see below:

select music.id 
from music where music.id in (
    select * from (
        select music.id from music where music.user_id in (1, 2, 3) order by music.updated_at desc, music.id desc limit 0, 100
    ) as subquery_for_limit
) order by music.updated_at desc, music.id desc limit 100

which generates the below plan:

{
  "QueryType": "SELECT",
  "Original": "select music.id from music where music.id in (select * from (select music.id from music where music.user_id in (1, 2, 3) order by music.updated_at desc, music.id desc limit 0, 100) as subquery_for_limit) order by music.updated_at desc, music.id desc limit 100",
  "Instructions": {
    "OperatorType": "Limit",
    "Count": "INT64(100)",
    "Inputs": [
      {
        "OperatorType": "UncorrelatedSubquery",
        "Variant": "PulloutIn",
        "PulloutVars": [
          "__sq_has_values",
          "__sq1"
        ],
        "Inputs": [
          {
            "InputName": "SubQuery",
            "OperatorType": "Limit",
            "Count": "INT64(100)",
            "Offset": "INT64(0)",
            "Inputs": [
              {
                "OperatorType": "Route",
                "Variant": "IN",
                "Keyspace": {
                  "Name": "user",
                  "Sharded": true
                },
                "FieldQuery": "select id, subquery_for_limit.updated_at, weight_string(subquery_for_limit.updated_at), weight_string(id) from (select music.id, music.updated_at from music where 1 != 1) as subquery_for_limit where 1 != 1",
                "OrderBy": "(1|2) DESC, (0|3) DESC",
                "Query": "select id, subquery_for_limit.updated_at, weight_string(subquery_for_limit.updated_at), weight_string(id) from (select music.id, music.updated_at from music where music.user_id in ::__vals) as subquery_for_limit order by subquery_for_limit.updated_at desc, subquery_for_limit.id desc limit :__upper_limit",
                "Table": "music",
                "Values": [
                  "(INT64(1), INT64(2), INT64(3))"
                ],
                "Vindex": "user_index"
              }
            ]
          },
          {
            "InputName": "Outer",
            "OperatorType": "Route",
            "Variant": "IN",
            "Keyspace": {
              "Name": "user",
              "Sharded": true
            },
            "FieldQuery": "select music.id, music.updated_at, weight_string(music.updated_at), weight_string(music.id) from music where 1 != 1",
            "OrderBy": "(1|2) DESC, (0|3) DESC",
            "Query": "select music.id, music.updated_at, weight_string(music.updated_at), weight_string(music.id) from music where :__sq_has_values and music.id in ::__vals order by music.updated_at desc, music.id desc",
            "ResultColumns": 1,
            "Table": "music",
            "Values": [
              "::__sq1"
            ],
            "Vindex": "music_user_map"
          }
        ]
      }
    ]
  },
  "TablesUsed": [
    "user.music"
  ]
}

Even though we're joining on a lookup vindex column, this query is broken up into two pieces and performs a lookup vindex operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants