-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
capnp: Refactor Arenas #586
Conversation
39a3edb
to
4d5c748
Compare
This is now triggering a new way for |
Unfortunately the final races are too deep to easily fix. I'm trying a different approach. |
dd246c2
to
47ecae8
Compare
Rebased and updated to fix the final data races. Also updated the stats against latest main branch. |
Nice! I am aiming to review this afternoon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Flagged a few questions inline.
This refactors the two main arena implementations (SingleSegment and MultiSegment) to improve their performance and API. This is the first of (hopefully) many refactor commits. This commit attempts to maintain as much as possible the implied semantics of the Arena API, while still moving the needle performance and complexity wise. Future commits may make deeper changes the Arena contract. The main goal for the refactor in this commit is to make segment tracking the responsibility of Arenas, instead of it being the responsibility of Message instances. The following benefits are obtained by this: - Simplifies the logic in Message instances, by offloading most of it to the Arena implementation in use. - Improves performance by allowing each type of Arena to tailor its implementation (e.g. single vs multi segment). The following is a summary of the changes in this commit: - Change the API for Arena.Allocate() to return the Segment and Address instead of SegmentID and Slice. This will be useful for reducing the depth of the call stack for alloc calls. - Introduce Arena.Segment() and deprecate Arena.Data. Segment() is a more widely useful API and will be used to reduce the depth of some call paths. Data() is only currently used in tests. - Remove the segments map from Message and leave that up to Arena implementations. - Remove locking from Message. It was not actually able to correctly protect message usage, so it was purely a performance drain. - MultiSegmentArena tracks segments by slice instead of map. A slice is generally faster and segments have the requirement to be in ascending SegmentID order. - Pools for segment data allocation are set explicitly instead of always using the default one. In the future, this can be exposed as options when initializing the Arena to tailor usage to application-specific behavior. - Segment data is returned to pools only if they were obtained from it (i.e. by initializing a new empty Arena). This ensures buffers are not incorrectly added to a global pool when they could've come from anywhere (including a memory mapped file that could become invalid). - SingleSegmentArena now also uses a global pool for references (same as MultiSegmentArena already did). Release() calls ensure only arenas fetched from pools are returned there. - Introduction of ReadOnlySingleSegmentArena which bypasses all pools and allows reusing an arena value by aliasing different buffers on top of it. - FIXMEs and TODOs are added to contentious implied API behavior which will be the target of future refactors. In the future, the following refactors will be pursued, unless futher discussion deems them unhelpful: - Make it part of the contract and tests that Arena.Allocate() always return zeroed bytes. - Make Single/Multi never allocate if passed buffers. This would effectively make them either read-only (when passed a buffer) or writable when initialized without one. - Arenas should never have zero segments when assigned to a message. This should be part of the contract of Arenas and tested in Message. - Message.Reset should not enforce allocation of the first word in an Arena. That should be done by the first call to Message.Root().
In the future, this will make it possible external implementations of Segment to be written. It also fixes the final data race in the refactor.
47ecae8
to
09a9668
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR performs the first refactor of the Arena interface and its existing implementations.
The ultimate goal for this and following arena refactor PRs is to make the API
saner, more performant and allow for third party creation of Arena
implementations (which is not possible today due to need to access private
fields within Segment).
This PR attempts to maintain as much as possible the interface and implied
semantics of the Arena interface, while still improving the overall API and
performance.
I have attempted to (as much as feasible) not break any of the current
assumptions that external code may make about the behavior of the existing Arena
implementations. This has proven hard, due to the high coupling between the
implementations and the Message and Segment types (which is partially unwound
here). This can be asserted by reviewers by noting how few of the existing tests
had to be fixed.
In the future, I hope to be more aggressive in the changes proposed, thus this
initial PR is meant to serve as base, upon which more significant changes will
be made.
Benchstat comparison before/after refactor
Note: the benchmarks "old" benchmarks were run on the commit titled "Fix reuse of SingleSegmentArena" in order for the fixes in that commit to apply both before and after the refactor.
As can be seen, the absolute majority of the benchmarks is improved by this PR.
The second commit (titled "Refactor Internals") has further details on which
specific changes were done.