Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: avoid read after create when possible #4041

Merged
merged 20 commits into from
Jul 12, 2023
Merged

Conversation

Weakky
Copy link
Member

@Weakky Weakky commented Jun 13, 2023

Overview

Implements a solution for inserts for prisma/prisma#5043.

A create operation is currently always followed by a read to fulfill the query selection set. This leads to 4 queries sent in total:

  1. Open transaction
  2. Create record
  3. Read result
  4. Commit transaction

In some cases, performing a create and getting the result back immediately in a single query using INSERT ... RETURNING is possible.

We can only do so when:

  • The connector supports it (Only Postgres & CockroachDB currently have support)
  • The create consists only of scalar fields of the parent model and no nested operation
  • The read selection consists only of scalars fields of the parent model and no relation fetch

Note that:

  • SQL Server is not supported 🔴. While it can technically be implemented but is risky to be supported as we have to use a hack to support INSERT OUTPUT which makes it fragile to return any type of columns
  • SQLite is not supported 🔴. It also supports INSERT ... RETURNING but the driver does not return column-type information on the way back which can break some coercions we currently perform & need.

Benchmarks

Given this datamodel:

model Movie {
  id          Int    @id @default(autoincrement())
  title       String
  year        Int
  description String

  cast    Actor[] // m2m
  reviews Review[] // one2m
}

And this query:

await prisma.movie.create({
  data: {
    year: 2023,
    title: '<inserted_movie>',
    description: '<placeholder_descrition>',
  },
});

Here are the results:

Before

cpu: Apple M1 Max
runtime: node v18.12.1 (arm64-darwin)

benchmark               time (avg)             (min … max)       p75       p99      p995
---------------------------------------------------------- -----------------------------
• prisma.movie.create(...)
---------------------------------------------------------- -----------------------------
PG (regular joins)    1.12 ms/iter  (625.29 µs … 23.42 ms) 908.21 µs   6.31 ms  22.18 ms
Prisma                4.52 ms/iter     (2.51 ms … 28.3 ms)   3.89 ms  27.89 ms   28.3 ms

summary for prisma.movie.create(...)
  PG (regular joins)
   4.05x faster than Prisma

After

cpu: Apple M1 Max
runtime: node v18.12.1 (arm64-darwin)

benchmark               time (avg)             (min … max)       p75       p99      p995
---------------------------------------------------------- -----------------------------
• prisma.movie.create(...)
---------------------------------------------------------- -----------------------------
PG (regular joins)    1.08 ms/iter  (652.38 µs … 22.85 ms) 882.83 µs   6.26 ms  22.48 ms
Prisma                1.42 ms/iter  (755.79 µs … 23.08 ms)   1.31 ms   6.88 ms  22.85 ms

summary for prisma.movie.create(...)
  PG (regular joins)
   1.32x faster than Prisma

Query differences

Given the datamodel & query above, here are the queries before:

BEGIN
INSERT INTO "z_imdb_bench"."Movie" ("title","year","description") VALUES ($1,$2,$3) RETURNING "z_imdb_bench"."Movie"."id"
SELECT "z_imdb_bench"."Movie"."id", "z_imdb_bench"."Movie"."title", "z_imdb_bench"."Movie"."year", "z_imdb_bench"."Movie"."description" FROM "z_imdb_bench"."Movie" WHERE "z_imdb_bench"."Movie"."id" = $1 LIMIT $2 OFFSET $3
COMMIT

And after:

INSERT INTO "z_imdb_bench"."Movie" ("title","year","description") VALUES ($1,$2,$3) RETURNING "z_imdb_bench"."Movie"."id", "z_imdb_bench"."Movie"."title", "z_imdb_bench"."Movie"."year", "z_imdb_bench"."Movie"."description"

@codspeed-hq
Copy link

codspeed-hq bot commented Jun 13, 2023

CodSpeed Performance Report

Merging #4041 perf/avoid-read-after-create (e74b2ba) will not alter performances.

Summary

🔥 0 improvements
❌ 0 regressions
✅ 11 untouched benchmarks

🆕 0 new benchmarks
⁉️ 0 dropped benchmarks

@Weakky Weakky added this to the 5.0.0 milestone Jun 20, 2023
@Weakky Weakky marked this pull request as ready for review June 21, 2023 14:32
@Weakky Weakky requested a review from a team as a code owner June 21, 2023 14:32
Comment on lines +31 to +60
graph.flag_transactional();

let create_node = create::create_record_node(graph, query_schema, model.clone(), data_map)?;

// Follow-up read query on the write
let read_query = read::find_unique(field, model.clone())?;
let read_node = graph.create_node(Query::Read(read_query));

graph.add_result_node(&read_node);
graph.create_edge(
&create_node,
&read_node,
QueryGraphDependency::ProjectedDataDependency(
model.primary_identifier(),
Box::new(move |mut read_node, mut parent_ids| {
let parent_id = match parent_ids.pop() {
Some(pid) => Ok(pid),
None => Err(QueryGraphBuilderError::AssertionError(
"Expected a valid parent ID to be present for create follow-up read query.".to_string(),
)),
}?;

if let Node::Query(Query::Read(ReadQuery::RecordQuery(ref mut rq))) = read_node {
rq.add_filter(parent_id.filter());
};

Ok(read_node)
}),
),
)?;
Copy link
Contributor

@jkomyno jkomyno Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer's note: this is just a chunk of code that already existed, but that is now wrapped in the else branch of a conditional

}),
),
)?;
if can_use_atomic_create(query_schema, &model, &data_map, &field) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer's note: this function is defined later

@jkomyno
Copy link
Contributor

jkomyno commented Jun 21, 2023

I've noticed we added InsertReturning to Postgres and CockroachDB, as the PR description states that only these drivers support INSERT INTO ... RETURNING ....

What about MariaDB (supported since 10.5.0)? It's perfectly fine to me if we consider too niche of a case - just wanted to make sure we are all aware that this feature exists.

@Weakky
Copy link
Member Author

Weakky commented Jun 21, 2023

I've noticed we added InsertReturning to Postgres and CockroachDB, as the PR description states that only these drivers support INSERT INTO ... RETURNING ....

What about MariaDB (supported since 10.5.0)? It's perfectly fine to me if we consider too niche of a case - just wanted to make sure we are all aware that this feature exists.

We don't have the connector's version at runtime in the Query Engine, so we can't implement version-specific connector features (yet). This is the main reason why MariaDB isn't supported. It's also not it's own connector.

Copy link
Contributor

@jkomyno jkomyno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@Weakky Weakky force-pushed the perf/avoid-read-after-create branch 2 times, most recently from 13b5b99 to e6ff9ae Compare June 22, 2023 09:54
Copy link
Contributor

@miguelff miguelff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any blocking feedback on this. Left a few questions and a couple of comments of internal factoring of the code that are unimportant.

I find the implementation neat, the PR description sterling and the comments useful. Great job, @Weakky!

if can_use_atomic_create(query_schema, &model, &data_map, &field) {
let create_node = create::atomic_create_record_node(graph, query_schema, model, data_map, field)?;

graph.add_result_node(&create_node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking: atomic_create_record_node uses the graph to create nodes internally, and could include this operation. Is there any reason to make them separate?

Also, this branch of the code is pretty well extracted while the else branch inlines node´s creation. Would it make sense to have either both inlined or both extracted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking: atomic_create_record_node uses the graph to create nodes internally, and could include this operation. Is there any reason to make them separate?

Is "this operation" referring to graph.add_result_node ? If so, then the reason is that we might not always want to make it a result node. It very much depends on what workflow we're creating with the graph. While this is the only use-case atm, we could very well have a different operation that uses this atomic_create_record_node but needs a completely different node as result node.

Also, this branch of the code is pretty well extracted while the else branch inlines node´s creation. Would it make sense to have either both inlined or both extracted?

I'm not sure I understand how the if branch differs from the else branch. In both cases, we're inlining nodes creation. The only difference is that the else branch has more stuff going on, such as adding a subsequent read node. Do you see it differently?


Ok(QueryResult::Id(Some(res)))
Ok(QueryResult::RecordSelection(Box::new(RecordSelection {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that changing the semantics of the result from QueryResult::Id to QueryResult::RecordSelection don't break any internal contract? Aren't there used or eventually serialized different? if so, why do we have the two different enum variants in QueryResult?

Copy link
Member Author

@Weakky Weakky Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both QueryResult::Id and QueryResult::RecordSelection can be used as results. Have a look at this code here and here, where we extract results out of both.

I wasn't here when these abstractions were created so I can only assume why we have both. QueryResult::RecordSelection needs a lot more information as it ends up being serialized. QueryResult::Id is just a simplified version of RecordSelection when we know it's only used as a temporary result for another operation (eg: create --(create id)--> read).

This was holding well until now, but we're stretching the abstraction by making create's result now sometimes go through the serialization (in an atomic create), and sometimes not (when we still need a read after the create, for instance).

As said below, I could make create_record return a RecordSelection or an Id based on the type of create we're doing, but I didn't think it was worth it as we're perfectly able to extract the result out of both Id and RecordSelection. I do agree that this is all a bit fuzzy atm. It's often the case when slapping new concepts on top of existing abstractions without really evolving them. The same questions are gonna be relevant when I tackle atomic update, which works the exact same way.

My thoughts atm is that the abstractions still covers our need but we might have to change them a bit in the future if we add more subtleties to operations. Until then, I didn't want to prematurely introduce more concepts. With that being said, I'm 100% open to suggestions


let cr = CreateRecord {
// A regular create record is never used as a result node. Therefore, it's never serialized, so we don't need a name.
name: String::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is immutable and creating an empty string has zero cost. I was wondering if given the name is not always necessary and should be an Option but it's perfectly fine and more straightforward to be this way.

Copy link
Member Author

@Weakky Weakky Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with having an Option<String> for the name is that eventually, that CreateRecord will be transformed into a RecordSelection, which requires a name (even though the ones that don't contain names won't end up serialized). So we'd have to create that String::new here anyway

I thought about creating an enum of CreateRecord { Atomic(), Regular } to make that explicit and either return a QueryResult::Id or a QueryResult::RecordSelection based on that, but I didn't want to make the ReadOperations::create_record interface more complex with different types of creates to handle.

I could've also, based on whether there's a name or not, returned a RecordSelection or an Id. But it felt hacky and worse than having an empty string.

@Jolg42 Jolg42 modified the milestones: 5.0.0, 5.1.0 Jul 10, 2023
@Weakky Weakky merged commit 96f20d9 into main Jul 12, 2023
37 checks passed
@Weakky Weakky deleted the perf/avoid-read-after-create branch July 12, 2023 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants