Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: resolve relations with lateral joins #4509

Merged
merged 40 commits into from
Dec 5, 2023
Merged

Conversation

Weakky
Copy link
Member

@Weakky Weakky commented Nov 30, 2023

Overview

This PR enables the QueryEngine to resolve relations using LATERAL JOINs. This feature is currently only available on Postgres & CockroachDB and under the relationJoins preview feature flags. To minimize damage, we have, as much as possible, created new code paths for this new feature.

To minimize data duplicates traditionally caused by joins, we're using JSON aggregations. Here's what a query may look like on a one2m relation:

model user {
  userId Int     @id @default(autoincrement()) @map("user_id")
  email    String?

  posts post[]
}

model post {
  postId     Int       @id @default(autoincrement()) @map("post_id")
  title String?
  createdAt DateTime? @map("created_at")

  userId Int?  @map("user_id")
  user   user? @relation(fields: [userId], references: [userId])
}
{
  findManyuser(where: { email: "[email protected]" }) {
    email
    posts(take: 10, orderBy: { createdAt: desc }) {
      title
    }
  }
}
SELECT
  "t1"."user_id",  -- selected scalar
  "t1"."email", -- selected scalar
  "user_posts"."__prisma_data__" AS "posts" -- selected relation aggregated as JSON
FROM
  "a_join_repro"."user" AS "t1"
  LEFT JOIN LATERAL (
    SELECT
      COALESCE(JSON_AGG("__prisma_data__"), '[]') AS "__prisma_data__" -- JSON aggregation
    FROM
      (
        SELECT
          "t4"."__prisma_data__" -- nested query to enable ordering and pagination to work
        FROM
          (
            SELECT
              JSON_BUILD_OBJECT('title', "t3"."title") AS "__prisma_data__", -- Actual JSON object building per records
              "t3"."created_at" -- this field is queried so that it can be ordered by on the outer query
            FROM
              (
                SELECT
                  "t2".*
                FROM
                  "a_join_repro"."post" AS "t2"
                WHERE
                  "t1"."user_id" = "t2"."user_id" -- most inner select, where the table is joined with the parent
              ) AS "t3" /* root select */
          ) AS "t4" /* inner select */
        ORDER BY
          "t4"."created_at" DESC -- relation ordering
        LIMIT 10 -- relation pagination
      ) AS "t5" /* middle select */
  ) AS "user_posts" ON true /* outer select */
WHERE
  "t1"."email" = '[email protected]' -- top-level filter

This is roughly achieved thanks to the addition/update of some core abstractions:

  1. FieldSelection: A Relation variant is added. Previously, relations were always collected as separate ReadQuery. Please note that, for now, separate read queries are still collected even when relationJoins is enabled.
  2. RelationLoadStrategy { Query, Join }: A new enum that let's the core and the connectors decide how to load relations. While I would very much like to keep the way relations are loaded entirely to the connectors' responsibility, we, unfortunately, have to handle results differently. So for now, the core passes down to the connectors how to resolve relations, and the result is handled according to that strategy.
  3. RecordSelectionWithRelation: A new type of RecordSelection (eventually passed to the serializer) was added to serialize data differently (given that relations are now part of the ManyRecords when joins are used, as opposed to before where relations data were independently stored as results of nested RelatedRecordsQuery)

@Weakky Weakky added this to the 5.7.0 milestone Nov 30, 2023
Copy link

codspeed-hq bot commented Nov 30, 2023

CodSpeed Performance Report

Merging #4509 will degrade performances by 8.26%

⚠️ No base runs were found

Falling back to comparing integration/join-support (d6d5262) with main (0d724e3)

Summary

❌ 1 regressions
✅ 10 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main integration/join-support Change
medium_read 1.3 ms 1.4 ms -8.26%

@Weakky Weakky changed the title [WIP] Resolve relations with lateral joins #4498 [WIP] Resolve relations with lateral joins Dec 1, 2023
Copy link
Contributor

@miguelff miguelff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is big enough and outside of my full area of knowledge as for me alone not being able to provide a meaningful review. For that, @Weakky kindly walked me through the code changes and we discussed some of the decisions made.

The conclusion, after that conversation is that the behavior is correct (as expressed by the tests) and that there is no reason to block shipping the Lateral Join strategy for relation fetching as a preview feature.

match ctx.max_bind_values {
Some(chunk_size) if query_arguments.should_batch(chunk_size) => {
return Err(SqlError::QueryParameterLimitExceeded(
"Joined queries cannot be split into multiple queries just yet. If you encounter this error, please open an issue"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed how this could trigger by a user exceeding the number of bind params artificially (f.i. by dynamically composing wrong a query) and we agreed this wouldn't be subject of an issue on our end. As such this message can be misleading and frame as if we did something wrong with the implementation. A more appropriate tactic might be ignoring batching altogether and letting the code reach the DB, bubbling up to the user any potential query parameter limit exceeded.

@Weakky Weakky changed the title [WIP] Resolve relations with lateral joins Resolve relations with lateral joins Dec 4, 2023
@Weakky Weakky changed the title Resolve relations with lateral joins feat: resolve relations with lateral joins Dec 4, 2023
@Weakky Weakky marked this pull request as ready for review December 4, 2023 14:48
@Weakky Weakky requested a review from a team as a code owner December 4, 2023 14:48
@Weakky Weakky requested review from jkomyno and Druue and removed request for a team December 4, 2023 14:48
aqrln and others added 4 commits December 4, 2023 19:50
The build script had an invalid `sed` command with an extra `''`
argument that caused it to fail with

```
sed: can't read s/name = "query_engine_wasm"/name = "query_engine"/g: No such file or directory
```

This is reproducible both on CI and locally for me. Perhaps it was
written for BSD sed and doesn't work with GNU sed (so it always fails on
Linux and also fails on macOS inside prisma-engines Nix flake but maybe
it works on macOS without Nix)?

Because of this, a broken package was published from CI.

The commit fixes the `sed` command and adds `set -e` so that errors like
this would fail CI instead of silently continuing and doing wrong
things.
@Weakky Weakky merged commit 1964a5c into main Dec 5, 2023
62 of 64 checks passed
@Weakky Weakky deleted the integration/join-support branch December 5, 2023 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants