Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6338] RelMdCollation#project can return an incomplete list of collations in the presence of aliasing #3739

Merged
merged 1 commit into from
Mar 25, 2024

Conversation

rubenada
Copy link
Contributor

@rubenada rubenada commented Mar 21, 2024

More details in: CALCITE-6338

Note: the quidem test that required an adjustment actually shows a good example of a side effect of this bug:
Notice how the original plan contained an unnecessary EnumerableSort:

old plan:
  EnumerableMergeJoin(condition=[=($0, $5)], joinType=[left])
    EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0])
      EnumerableTableScan(table=[[scott, EMP]])
    EnumerableSort(sort0=[$4], dir0=[ASC]) <-- *** HERE! ***
      EnumerableCalc(expr#0..7=[{inputs}], expr#8=[1:BIGINT], expr#9=[true], c=[$t8], d=[$t8], m=[$t0], trueLiteral=[$t9], EMPNO1=[$t0])
        EnumerableTableScan(table=[[scott, EMP]])
=>
new plan:
  EnumerableMergeJoin(condition=[=($0, $5)], joinType=[left])
    EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0])
      EnumerableTableScan(table=[[scott, EMP]])
    EnumerableCalc(expr#0..7=[{inputs}], expr#8=[1:BIGINT], expr#9=[true], c=[$t8], d=[$t8], m=[$t0], trueLiteral=[$t9], EMPNO1=[$t0])
      EnumerableTableScan(table=[[scott, EMP]])

scott.EMP table scan is "by default" sorted by EMPNO ($0), for this reason there is no Sort on the first input of the MergeJoin. However, since the second input contains a Calc (this fix is for both Project and Calc) that was projecting EMPNO as $2 and $4, the planner was incorrectly assuming that we needed an EnumerableSort on $4 (i.e. the "second" EMPNO), when in fact this was not the case (the table scab on scott.EMP was already sorted by EMPNO). This was because the Project (or Calc in this case) collation calculation was [2], instead of [2][4].
Thus, the new plan seems both correct and more efficient.

Copy link
Contributor

@snuyanzin snuyanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into it

One tiny not really related to the changes comment however since this method was touch org.apache.calcite.rel.metadata.RelMdCollation#project

It seems it never returns null, In that case should we remove @Nullable or replace it with @NonNull?

@rubenada rubenada changed the title [CALCITE-6338] RelMdCollation#project can return an incomplete list of collations [CALCITE-6338] RelMdCollation#project can return an incomplete list of collations in the presence of aliasing Mar 21, 2024
Copy link

sonarcloud bot commented Mar 21, 2024

@rubenada
Copy link
Contributor Author

rubenada commented Mar 22, 2024

Thanks for the review @snuyanzin .
I have made some minor changes, based on some feedback in Jira.

It seems it never returns null, In that case should we remove @Nullable or replace it with @NonNull?

I tried both removing @Nullable and replacing it with @NonNull, but in both cases CheckFramework failed (due to the annotations used by the methods called by the modified one). So I think CheckFramework is not clever enough to grasp the situation (or I'm not clever enough to make it understand it). Since this annotation is not really a part of the fix, and I am not very dextrous with this tool, I propose to keep the original state in this regard.

@rubenada rubenada added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Mar 22, 2024
Copy link
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rubenada
Copy link
Contributor Author

@snuyanzin could you please take another look at the PR? Do you have any other comment? I plan to squash & merge in the next 24-48h if no further remarks arrive.

Copy link
Contributor

@snuyanzin snuyanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snuyanzin could you please take another look at the PR? Do you have any other comment? I plan to squash & merge in the next 24-48h if no further remarks arrive.

yes, feel free to go ahead it looks ok from my side
and sorry for the delay

I tried both removing @nullable and replacing it with @nonnull, but in both cases CheckFramework failed (due to the annotations used by the methods called by the modified one).

yep, I also tried that locally and failed, sorry for confusing....

@rubenada rubenada merged commit d5b6b5c into apache:main Mar 25, 2024
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants