-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(compiler): improve JSII loading speed via binary format cache #3567
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
less assumptions and no need to unzip gzipped compressed manifesets in order to check if we have them cached
@MarkMcCulloh, @eladb Do you guys think I should add some env var setting to skip generating the cache for faster CI/testing? |
To overcome bincode not supporting serde tagged enum deserialization we use bincode's own (non serde) Encode/Decode traits.
eladb
reviewed
Jul 23, 2023
yoav-steinberg
added
the
🚧 pr/do-not-merge
PRs with this label will not be automatically merged by mergify.
label
Jul 25, 2023
This is because there's no WASM support for path cononicalization. Beofre caching silently failed on WASM.
this is for wasm compatibility so we don't need `temp_dir()` access or canonical path calculation which aren't available in WASM. Also added cleanup of old cache files or change detection
Signed-off-by: monada-bot[bot] <[email protected]>
monadabot
added
the
⚠️ pr/review-mutation
PR has been mutated and will not auto-merge. Clear this label if the changes look good!
label
Jul 26, 2023
Signed-off-by: monada-bot[bot] <[email protected]>
Signed-off-by: monada-bot[bot] <[email protected]>
Signed-off-by: monada-bot[bot] <[email protected]>
Signed-off-by: monada-bot[bot] <[email protected]>
Signed-off-by: monada-bot[bot] <[email protected]>
Signed-off-by: monada-bot[bot] <[email protected]>
Co-authored-by: Mark McCulloh <[email protected]>
this imporves performance, especially in WASM
…en running in a dev env
MarkMcCulloh
approved these changes
Jul 31, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice perf numbers, looking forward to this change going in ❤️
tools/hangar/__snapshots__/test_corpus/valid/debug_env.w_test_sim.md
Outdated
Show resolved
Hide resolved
yoav-steinberg
removed
the
🚧 pr/do-not-merge
PRs with this label will not be automatically merged by mergify.
label
Aug 1, 2023
Congrats! 🚀 This was released in Wing 0.24.60. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
🧪 pr/e2e-full
⚠️ pr/review-mutation
PR has been mutated and will not auto-merge. Clear this label if the changes look good!
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR caches the
.jsii
manifests in/tmp/.wing/jsii_manifest_cache/
in a binary format which is quicker to deserialize. It compares the original files to the cached ones using canonical path, file size and modification time. If no match is found it loads the original file and attempts to cache the file, if a match is found it'll load the JSII manifest from the cache.In addition I identified some extra memory copying (cloning) of the imported JSII assemblies which was redundant. Eliminating this also added some performance.
A few notes:
I ended up using rmp_serde (msgpack serde). There are faster ones which we might move to in the future but the problem is that their implementation is incomplete. The main issue being lack of support for "self-describing" formats where the deserializer needs to figure out the output based on the schema information in the file itself (which is mainly required for deserializing rust algebraic enums and also handle missing fields in structs).Even withrmp_serde
I needed to remove the missing fields support from our serde definitions to get this working. rmp_serde actually does support missing fields but that hinders its performance (seermp_serde::encode::write
vsrmp_serde::encode::write_named
).bincode
, and specifically its non-serde serializer. Ibincode
as most fast serializer's doens't support thetag
field for enum serialization which is required to deserialize .jsii files usingserde_json
. By usingbincode
's non-serde serializer we don't need to deal with the tag field.bincode
seems like one of the faster serializers, we can consider some more options from here in the future.update: I endded up using speedy which provided the best performance (see benchmarks)
benchmarks
normal test:
hyperfine -w 1 "cargo run --example compile -- testfile.w"
empty cache test (replace FORMAT with speedy/bincode/bson based on the name of the cache file generated):
hyperfine -w 1 -p "find ../.. | grep ".jsii.FORMAT" | xargs rm || true" "cargo run --example compile -- testfile.w"
rmp_serde no cachermp_serde cachedprofiling
Looking a the flamegraphs it's clear that deserializing the JSII manifests is still taking the bulk of the compilation time when importing large ones (cdktf...). I think there's value in continuing this work by either using a faster deserializer than
rmp_serde
bincode
or dropping JSII deserialization and implementing our own type-system serde per namespace and caching that.Checklist
pr/e2e-full
label if this feature requires end-to-end testingBy submitting this pull request, I confirm that my contribution is made under the terms of the Wing Cloud Contribution License.