Skip to content

Commit

Permalink
[SPARK-44562][SQL] Add OptimizeOneRowRelationSubquery in batch of Sub…
Browse files Browse the repository at this point in the history
…query

### What changes were proposed in this pull request?

This PR adds `OptimizeOneRowRelationSubquery` in batch of `Subquery`.

### Why are the changes needed?

To further optimize the query. Currently, `OptimizeOneRowRelationSubquery` cannot optimize the subquery if an optimizable filter exists. For example:
```sql
CREATE temporary VIEW v1
AS
SELECT id, 'foo' AS kind FROM (SELECT 1 AS id) t;

CREATE temporary VIEW v2
AS
SELECT * FROM v1 WHERE kind = (SELECT kind FROM v1 WHERE kind = 'foo');

EXPLAIN EXTENDED SELECT * FROM v1 JOIN v2 ON v1.id = v2.id;
```

Before this PR:
```
== Optimized Logical Plan ==
Join Inner
:- Project [1 AS id#18, foo AS kind#19]
:  +- OneRowRelation
+- Project [1 AS id#21, foo AS kind#22]
   +- Filter (foo = scalar-subquery#20 [])
      :  +- Project [foo AS kind#30]
      :     +- OneRowRelation
      +- OneRowRelation
```

After this PR:
```
== Optimized Logical Plan ==
Join Inner
:- Project [1 AS id#253, foo AS kind#254]
:  +- OneRowRelation
+- Project [1 AS id#256, foo AS kind#257]
   +- OneRowRelation
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #42180 from wangyum/SPARK-44562.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Yuming Wang <[email protected]>
  • Loading branch information
wangyum committed Aug 2, 2023
1 parent 35d4765 commit 886f7c8
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,8 @@ abstract class Optimizer(catalogManager: CatalogManager)
// Subquery batch applies the optimizer rules recursively. Therefore, it makes no sense
// to enforce idempotence on it and we change this batch from Once to FixedPoint(1).
Batch("Subquery", FixedPoint(1),
OptimizeSubqueries) ::
OptimizeSubqueries,
OptimizeOneRowRelationSubquery) ::
Batch("Replace Operators", fixedPoint,
RewriteExceptAll,
RewriteIntersectAll,
Expand Down
26 changes: 25 additions & 1 deletion sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ import scala.collection.mutable.ArrayBuffer

import org.apache.spark.sql.catalyst.expressions.SubqueryExpression
import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, LogicalPlan, Project, Sort, Union}
import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Filter, Join, LogicalPlan, Project, Sort, Union}
import org.apache.spark.sql.execution._
import org.apache.spark.sql.execution.adaptive.{AdaptiveSparkPlanHelper, DisableAdaptiveExecution}
import org.apache.spark.sql.execution.datasources.FileScanRDD
Expand Down Expand Up @@ -2733,4 +2733,28 @@ class SubquerySuite extends QueryTest
}
}
}

test("SPARK-44562: Add OptimizeOneRowRelationSubquery in batch of Subquery") {
withTempView("v1", "v2") {
sql(
"""
|CREATE temporary VIEW v1
|AS
|SELECT id, 'foo' AS kind FROM (SELECT 1 AS id) t
|""".stripMargin)
sql(
"""
|CREATE temporary VIEW v2
|AS
|SELECT * FROM v1 WHERE kind = (SELECT kind FROM v1 WHERE kind = 'foo')
|""".stripMargin)
val df = sql("SELECT * FROM v1 JOIN v2 ON v1.id = v2.id")
val filter = df.queryExecution.optimizedPlan.collect {
case f: Filter => f
}
assert(filter.isEmpty,
"Filter should be removed after OptimizeSubqueries and OptimizeOneRowRelationSubquery")
checkAnswer(df, Row(1, "foo", 1, "foo"))
}
}
}

0 comments on commit 886f7c8

Please sign in to comment.