Skip to content

Commit

Permalink
[SPARK-46683] Write a subquery generator that generates subqueries pe…
Browse files Browse the repository at this point in the history
…rmutations to increase testing coverage

### What changes were proposed in this pull request?

This PR creates a suite, `GeneratedSubquerySuite`, that generates SQL with variations of subqueries. These variations include:
1. The location of the subquery in the main query (SELECT, FROM, WHERE)
2. Whether the subquery is correlated, if it is in SELECT or WHERE.
3. The type of subquery predicate, if it is in WHERE.
4. Whether the subquery has a DISTINCT projection.
5. The operators in the subquery: currently there are JOINS, SET OPS, LIMIT and AGGREGATE (sum, count, groupby, no-groupby).

How this works is that this suite generates SQL queries, and are then run against Postgres using docker integration tests.

### Why are the changes needed?

There are a lot of subquery correctness issues, ranging from very old bugs to new ones that are being introduced due to work being done on subquery correlation optimization. This is especially in the areas of COUNT bugs and null behaviors.

To increase test coverage and robustness in this area, we want to write a subquery generator that writes variations of subqueries, producing SQL tests.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR adds test. NA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#44599 from andylam-db/generated_subqueries.

Authored-by: Andy Lam <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
  • Loading branch information
andylam-db authored and cloud-fan committed Jan 23, 2024
1 parent 2aed25b commit bc889c8
Show file tree
Hide file tree
Showing 2 changed files with 686 additions and 0 deletions.
Loading

0 comments on commit bc889c8

Please sign in to comment.