Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(compiler): improve JSII loading speed via binary format cache #3567

Merged
merged 43 commits into from
Aug 1, 2023

Conversation

yoav-steinberg
Copy link
Contributor

@yoav-steinberg yoav-steinberg commented Jul 22, 2023

This PR caches the .jsii manifests in /tmp/.wing/jsii_manifest_cache/ in a binary format which is quicker to deserialize. It compares the original files to the cached ones using canonical path, file size and modification time. If no match is found it loads the original file and attempts to cache the file, if a match is found it'll load the JSII manifest from the cache.

In addition I identified some extra memory copying (cloning) of the imported JSII assemblies which was redundant. Eliminating this also added some performance.

A few notes:

  1. I ended up using file metadata instead of the "fingerprint" field in the JSII manifest because doing so required unzipping gzipped .jsii files just to read the fingerprint field.
  2. I didn't use bson or flexbuffer serde implementations because their performance is worse than plain serde_json(!!!). See: https://github.com/djkoloski/rust_serialization_benchmark.
  3. I ended up using rmp_serde (msgpack serde). There are faster ones which we might move to in the future but the problem is that their implementation is incomplete. The main issue being lack of support for "self-describing" formats where the deserializer needs to figure out the output based on the schema information in the file itself (which is mainly required for deserializing rust algebraic enums and also handle missing fields in structs).
  4. Even with rmp_serde I needed to remove the missing fields support from our serde definitions to get this working. rmp_serde actually does support missing fields but that hinders its performance (see rmp_serde::encode::write vs rmp_serde::encode::write_named).
  5. I ended up using bincode, and specifically its non-serde serializer. I bincode as most fast serializer's doens't support the tag field for enum serialization which is required to deserialize .jsii files using serde_json. By using bincode's non-serde serializer we don't need to deal with the tag field. bincode seems like one of the faster serializers, we can consider some more options from here in the future.

update: I endded up using speedy which provided the best performance (see benchmarks)

benchmarks

normal test:
hyperfine -w 1 "cargo run --example compile -- testfile.w"
empty cache test (replace FORMAT with speedy/bincode/bson based on the name of the cache file generated):
hyperfine -w 1 -p "find ../.. | grep ".jsii.FORMAT" | xargs rm || true" "cargo run --example compile -- testfile.w"

test rmp_serde no cache rmp_serde cached main branch bincode no cache bincode cached speedy no cache speedy cached
hello.w 136ms 106ms 105ms 134ms 100ms 111ms 99ms
bring_cdktf.w 10.4s 2.9s 4.1s 9.1 1.39s 4.1s 740ms
bring_awscdk.w 3.9s 1.2s 1.8s 3.42s 734ms 1.83s 416ms
bring_projen.w 263ms 141ms 161ms 260ms 118ms 189ms 111ms

profiling

Looking a the flamegraphs it's clear that deserializing the JSII manifests is still taking the bulk of the compilation time when importing large ones (cdktf...). I think there's value in continuing this work by either using a faster deserializer than rmp_serde bincode or dropping JSII deserialization and implementing our own type-system serde per namespace and caching that.

Checklist

  • Title matches Winglang's style guide
  • Description explains motivation and solution
  • Tests added (always)
  • Docs updated (only required for features)
  • Added pr/e2e-full label if this feature requires end-to-end testing

By submitting this pull request, I confirm that my contribution is made under the terms of the Wing Cloud Contribution License.

@yoav-steinberg yoav-steinberg marked this pull request as ready for review July 22, 2023 11:31
@yoav-steinberg yoav-steinberg requested a review from a team as a code owner July 22, 2023 11:31
@yoav-steinberg
Copy link
Contributor Author

@MarkMcCulloh, @eladb Do you guys think I should add some env var setting to skip generating the cache for faster CI/testing?

libs/wingii/src/lib.rs Outdated Show resolved Hide resolved
@yoav-steinberg yoav-steinberg added the 🚧 pr/do-not-merge PRs with this label will not be automatically merged by mergify. label Jul 25, 2023
yoav-steinberg and others added 3 commits July 25, 2023 15:49
This is because there's no WASM support for path cononicalization. Beofre caching silently failed on WASM.
this is for wasm compatibility so we don't need `temp_dir()` access or canonical path calculation which aren't available in WASM.

Also added cleanup of old cache files or change detection
Signed-off-by: monada-bot[bot] <[email protected]>
@monadabot monadabot added the ⚠️ pr/review-mutation PR has been mutated and will not auto-merge. Clear this label if the changes look good! label Jul 26, 2023
monadabot and others added 6 commits July 26, 2023 10:42
Signed-off-by: monada-bot[bot] <[email protected]>
Signed-off-by: monada-bot[bot] <[email protected]>
Signed-off-by: monada-bot[bot] <[email protected]>
Signed-off-by: monada-bot[bot] <[email protected]>
Signed-off-by: monada-bot[bot] <[email protected]>
Signed-off-by: monada-bot[bot] <[email protected]>
.gitignore Outdated Show resolved Hide resolved
libs/wingc/src/type_check/jsii_importer.rs Show resolved Hide resolved
libs/wingii/src/lib.rs Show resolved Hide resolved
Copy link
Contributor

@MarkMcCulloh MarkMcCulloh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice perf numbers, looking forward to this change going in ❤️

libs/wingii/src/lib.rs Show resolved Hide resolved
libs/wingii/src/test.rs Outdated Show resolved Hide resolved
@yoav-steinberg yoav-steinberg removed the 🚧 pr/do-not-merge PRs with this label will not be automatically merged by mergify. label Aug 1, 2023
@yoav-steinberg yoav-steinberg merged commit 27bbdb2 into main Aug 1, 2023
19 checks passed
@yoav-steinberg yoav-steinberg deleted the yoav/jsii_bin_cache branch August 1, 2023 11:09
@monadabot
Copy link
Contributor

Congrats! 🚀 This was released in Wing 0.24.60.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧪 pr/e2e-full ⚠️ pr/review-mutation PR has been mutated and will not auto-merge. Clear this label if the changes look good!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants