Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-46683] Write a subquery generator that generates subqueries pe…
…rmutations to increase testing coverage ### What changes were proposed in this pull request? This PR creates a suite, `GeneratedSubquerySuite`, that generates SQL with variations of subqueries. These variations include: 1. The location of the subquery in the main query (SELECT, FROM, WHERE) 2. Whether the subquery is correlated, if it is in SELECT or WHERE. 3. The type of subquery predicate, if it is in WHERE. 4. Whether the subquery has a DISTINCT projection. 5. The operators in the subquery: currently there are JOINS, SET OPS, LIMIT and AGGREGATE (sum, count, groupby, no-groupby). How this works is that this suite generates SQL queries, and are then run against Postgres using docker integration tests. ### Why are the changes needed? There are a lot of subquery correctness issues, ranging from very old bugs to new ones that are being introduced due to work being done on subquery correlation optimization. This is especially in the areas of COUNT bugs and null behaviors. To increase test coverage and robustness in this area, we want to write a subquery generator that writes variations of subqueries, producing SQL tests. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR adds test. NA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#44599 from andylam-db/generated_subqueries. Authored-by: Andy Lam <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
- Loading branch information