Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capnp: Refactor Arenas #586

Merged
merged 4 commits into from
Aug 30, 2024
Merged

capnp: Refactor Arenas #586

merged 4 commits into from
Aug 30, 2024

Conversation

matheusd
Copy link
Contributor

@matheusd matheusd commented Aug 15, 2024

This PR performs the first refactor of the Arena interface and its existing implementations.

The ultimate goal for this and following arena refactor PRs is to make the API
saner, more performant and allow for third party creation of Arena
implementations (which is not possible today due to need to access private
fields within Segment).

This PR attempts to maintain as much as possible the interface and implied
semantics of the Arena interface, while still improving the overall API and
performance.

I have attempted to (as much as feasible) not break any of the current
assumptions that external code may make about the behavior of the existing Arena
implementations. This has proven hard, due to the high coupling between the
implementations and the Message and Segment types (which is partially unwound
here). This can be asserted by reviewers by noting how few of the existing tests
had to be fixed.

In the future, I hope to be more aggressive in the changes proposed, thus this
initial PR is meant to serve as base, upon which more significant changes will
be made.

Benchstat comparison before/after refactor

Note: the benchmarks "old" benchmarks were run on the commit titled "Fix reuse of SingleSegmentArena" in order for the fixes in that commit to apply both before and after the refactor.

name                                        old time/op    new time/op     delta
MessageGetFirstSegment-7                       363ns ± 2%      205ns ± 1%   -43.35%  (p=0.000 n=10+10)
TextMovementBetweenSegments-7                  434µs ± 1%      443µs ± 2%    +2.01%  (p=0.000 n=10+10)
Marshal-7                                     1.44µs ± 1%     1.23µs ± 1%   -14.71%  (p=0.000 n=10+9)
Marshal_ReuseMsg-7                             894ns ± 1%      702ns ± 2%   -21.51%  (p=0.000 n=10+10)
Unmarshal-7                                    947ns ± 1%      855ns ± 1%    -9.77%  (p=0.000 n=8+10)
Unmarshal_Reuse-7                              589ns ± 2%      557ns ± 1%    -5.34%  (p=0.000 n=10+9)
Decode-7                                      8.83ms ± 1%     7.58ms ± 0%   -14.11%  (p=0.000 n=10+10)
Growth/SingleSegment/release=false-7          10.4ms ± 3%     10.8ms ± 2%    +3.44%  (p=0.000 n=9+10)
Growth/MultiSegment/release=false-7           13.2ms ± 1%     10.3ms ± 2%   -21.96%  (p=0.000 n=10+10)
Growth/SingleSegment/release=true-7           10.7ms ± 4%     10.8ms ± 1%      ~     (p=0.631 n=10+10)
Growth/MultiSegment/release=true-7            12.8ms ± 1%     10.2ms ± 2%   -20.39%  (p=0.000 n=10+10)
SmallMessage/SingleSegment/release=false-7    1.34µs ± 1%     1.13µs ± 1%   -15.58%  (p=0.000 n=10+10)
SmallMessage/MultiSegment/release=false-7     1.53µs ± 3%     1.20µs ± 1%   -21.60%  (p=0.000 n=9+9)
SmallMessage/SingleSegment/release=true-7     1.08µs ± 1%     0.67µs ± 1%   -38.45%  (p=0.000 n=9+10)
SmallMessage/MultiSegment/release=true-7      1.01µs ± 1%     0.63µs ± 2%   -38.00%  (p=0.000 n=8+10)
Pool-7                                        64.5ns ± 1%     64.7ns ± 2%      ~     (p=0.491 n=10+10)
Pack-7                                        13.8µs ± 3%     13.8µs ± 1%      ~     (p=0.239 n=10+10)
Unpack-7                                      10.9µs ± 1%     10.9µs ± 1%      ~     (p=0.853 n=10+10)
Unpack_Large-7                                2.06µs ± 2%     2.06µs ± 1%      ~     (p=0.542 n=10+10)
Reader-7                                      22.9µs ± 2%     23.0µs ± 2%      ~     (p=0.089 n=10+10)
Reader_Large-7                                30.7µs ± 1%     30.2µs ± 2%    -1.45%  (p=0.002 n=10+10)
Extract-7                                     21.1µs ± 2%     20.9µs ± 1%    -1.09%  (p=0.001 n=10+10)
Insert-7                                      21.0µs ± 1%     21.1µs ± 1%      ~     (p=0.143 n=10+10)
PingPong-7                                    46.8µs ± 2%     46.1µs ± 1%    -1.37%  (p=0.004 n=10+8)

name                                        old alloc/op   new alloc/op    delta
MessageGetFirstSegment-7                       24.0B ± 0%       0.0B       -100.00%  (p=0.000 n=10+10)
Marshal-7                                     1.50kB ± 0%     1.32kB ± 0%   -12.21%  (p=0.000 n=10+10)
Marshal_ReuseMsg-7                              120B ± 0%        96B ± 0%   -20.00%  (p=0.000 n=10+10)
Unmarshal-7                                     592B ± 0%       336B ± 0%   -43.24%  (p=0.000 n=10+10)
Unmarshal_Reuse-7                              0.00B           0.00B           ~     (all equal)
Decode-7                                      7.70MB ± 0%     5.17MB ± 0%   -32.95%  (p=0.000 n=10+10)
Growth/SingleSegment/release=false-7          4.35MB ± 0%     4.35MB ± 0%    +0.02%  (p=0.000 n=10+9)
Growth/MultiSegment/release=false-7           1.59MB ± 0%     1.59MB ± 0%    -0.04%  (p=0.000 n=10+9)
Growth/SingleSegment/release=true-7           4.35MB ± 0%     4.35MB ± 0%    +0.02%  (p=0.000 n=10+10)
Growth/MultiSegment/release=true-7             537kB ± 1%      535kB ± 1%    -0.32%  (p=0.019 n=10+10)
SmallMessage/SingleSegment/release=false-7    1.41kB ± 0%     1.22kB ± 0%   -13.30%  (p=0.000 n=10+10)
SmallMessage/MultiSegment/release=false-7     1.82kB ± 0%     1.42kB ± 0%   -22.36%  (p=0.000 n=8+10)
SmallMessage/SingleSegment/release=true-7       328B ± 0%        80B ± 0%   -75.61%  (p=0.000 n=10+10)
SmallMessage/MultiSegment/release=true-7        304B ± 0%        80B ± 0%   -73.68%  (p=0.000 n=10+10)
Pool-7                                         0.00B           0.00B           ~     (all equal)
Extract-7                                     16.2kB ± 0%     15.7kB ± 0%    -3.16%  (p=0.000 n=10+10)
Insert-7                                      15.8kB ± 0%     15.5kB ± 0%    -1.76%  (p=0.000 n=10+10)

name                                        old allocs/op  new allocs/op   delta
MessageGetFirstSegment-7                        1.00 ± 0%       0.00       -100.00%  (p=0.000 n=10+10)
Marshal-7                                       7.00 ± 0%       5.00 ± 0%   -28.57%  (p=0.000 n=10+10)
Marshal_ReuseMsg-7                              2.00 ± 0%       1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
Unmarshal-7                                     3.00 ± 0%       3.00 ± 0%      ~     (all equal)
Unmarshal_Reuse-7                               0.00            0.00           ~     (all equal)
Decode-7                                       50.1k ± 0%      50.1k ± 0%    -0.06%  (p=0.000 n=10+10)
Growth/SingleSegment/release=false-7            19.4 ± 7%       19.7 ± 4%      ~     (p=0.666 n=10+10)
Growth/MultiSegment/release=false-7             26.0 ± 0%       20.0 ± 0%   -23.08%  (p=0.000 n=10+10)
Growth/SingleSegment/release=true-7             19.4 ± 7%       19.0 ± 0%      ~     (p=0.092 n=10+9)
Growth/MultiSegment/release=true-7              10.0 ± 0%        4.0 ± 0%   -60.00%  (p=0.000 n=9+10)
SmallMessage/SingleSegment/release=false-7      6.00 ± 0%       4.00 ± 0%   -33.33%  (p=0.000 n=10+10)
SmallMessage/MultiSegment/release=false-7       7.00 ± 0%       5.00 ± 0%   -28.57%  (p=0.000 n=10+10)
SmallMessage/SingleSegment/release=true-7       4.00 ± 0%       1.00 ± 0%   -75.00%  (p=0.000 n=10+10)
SmallMessage/MultiSegment/release=true-7        3.00 ± 0%       1.00 ± 0%   -66.67%  (p=0.000 n=10+10)
Pool-7                                          0.00            0.00           ~     (all equal)
Extract-7                                       32.0 ± 0%       32.0 ± 0%      ~     (all equal)
Insert-7                                        29.0 ± 0%       29.0 ± 0%      ~     (all equal)

name                                        old speed      new speed       delta
Decode-7                                     109MB/s ± 1%    127MB/s ± 0%   +16.43%  (p=0.000 n=10+10)
Growth/SingleSegment/release=false-7         101MB/s ± 4%     97MB/s ± 2%    -3.36%  (p=0.000 n=9+10)
Growth/MultiSegment/release=false-7         79.2MB/s ± 1%  101.4MB/s ± 2%   +28.14%  (p=0.000 n=10+10)
Growth/SingleSegment/release=true-7         98.0MB/s ± 4%   97.0MB/s ± 1%      ~     (p=0.644 n=10+10)
Growth/MultiSegment/release=true-7          81.8MB/s ± 1%  102.7MB/s ± 2%   +25.62%  (p=0.000 n=10+10)
SmallMessage/SingleSegment/release=false-7  53.7MB/s ± 1%   63.6MB/s ± 1%   +18.47%  (p=0.000 n=10+10)
SmallMessage/MultiSegment/release=false-7   46.8MB/s ± 4%   59.9MB/s ± 1%   +28.06%  (p=0.000 n=10+9)
SmallMessage/SingleSegment/release=true-7   66.4MB/s ± 1%  107.9MB/s ± 1%   +62.48%  (p=0.000 n=9+10)
SmallMessage/MultiSegment/release=true-7    71.1MB/s ± 1%  114.6MB/s ± 1%   +61.09%  (p=0.000 n=9+10)
Pack-7                                       742MB/s ± 3%    740MB/s ± 1%      ~     (p=0.239 n=10+10)
Unpack-7                                     938MB/s ± 1%    939MB/s ± 1%      ~     (p=0.853 n=10+10)
Unpack_Large-7                              25.4GB/s ± 2%   25.4GB/s ± 1%      ~     (p=0.529 n=10+10)
Reader-7                                     448MB/s ± 2%    444MB/s ± 2%      ~     (p=0.089 n=10+10)
Reader_Large-7                              1.71GB/s ± 1%   1.73GB/s ± 2%    +1.47%  (p=0.002 n=10+10)

As can be seen, the absolute majority of the benchmarks is improved by this PR.

The second commit (titled "Refactor Internals") has further details on which
specific changes were done.

@matheusd matheusd force-pushed the refactor-arenas branch 3 times, most recently from 39a3edb to 4d5c748 Compare August 15, 2024 18:32
@matheusd
Copy link
Contributor Author

This is now triggering a new way for TestPromiseOrder to fail... 🙄

@matheusd matheusd marked this pull request as draft August 20, 2024 19:26
@matheusd
Copy link
Contributor Author

Unfortunately the final races are too deep to easily fix. I'm trying a different approach.

@matheusd matheusd force-pushed the refactor-arenas branch 2 times, most recently from dd246c2 to 47ecae8 Compare August 29, 2024 10:35
@matheusd matheusd marked this pull request as ready for review August 29, 2024 10:40
@matheusd
Copy link
Contributor Author

Rebased and updated to fix the final data races. Also updated the stats against latest main branch.

@lthibault
Copy link
Collaborator

Nice! I am aiming to review this afternoon.

Copy link
Collaborator

@lthibault lthibault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Flagged a few questions inline.

segment.go Show resolved Hide resolved
arena.go Show resolved Hide resolved
arena.go Show resolved Hide resolved
arena_test.go Outdated Show resolved Hide resolved
This refactors the two main arena implementations (SingleSegment and
MultiSegment) to improve their performance and API.

This is the first of (hopefully) many refactor commits. This commit
attempts to maintain as much as possible the implied semantics of the
Arena API, while still moving the needle performance and complexity
wise. Future commits may make deeper changes the Arena contract.

The main goal for the refactor in this commit is to make segment
tracking the responsibility of Arenas, instead of it being the
responsibility of Message instances. The following benefits are obtained by
this:

- Simplifies the logic in Message instances, by offloading most of it
  to the Arena implementation in use.
- Improves performance by allowing each type of Arena to tailor its
  implementation (e.g. single vs multi segment).

The following is a summary of the changes in this commit:

- Change the API for Arena.Allocate() to return the Segment and Address
  instead of SegmentID and Slice. This will be useful for reducing the
  depth of the call stack for alloc calls.
- Introduce Arena.Segment() and deprecate Arena.Data. Segment() is a more
  widely useful API and will be used to reduce the depth of some call
  paths. Data() is only currently used in tests.
- Remove the segments map from Message and leave that up to Arena
  implementations.
- Remove locking from Message. It was not actually able to correctly
  protect message usage, so it was purely a performance drain.
- MultiSegmentArena tracks segments by slice instead of map. A slice is
  generally faster and segments have the requirement to be in ascending
  SegmentID order.
- Pools for segment data allocation are set explicitly instead of always
  using the default one. In the future, this can be exposed as options
  when initializing the Arena to tailor usage to application-specific
  behavior.
- Segment data is returned to pools only if they were obtained from it
  (i.e. by initializing a new empty Arena). This ensures buffers are not
  incorrectly added to a global pool when they could've come from
  anywhere (including a memory mapped file that could become invalid).
- SingleSegmentArena now also uses a global pool for references (same as
  MultiSegmentArena already did). Release() calls ensure only arenas
  fetched from pools are returned there.
- Introduction of ReadOnlySingleSegmentArena which bypasses all pools
  and allows reusing an arena value by aliasing different buffers on top
  of it.
- FIXMEs and TODOs are added to contentious implied API behavior which
  will be the target of future refactors.

In the future, the following refactors will be pursued, unless futher
discussion deems them unhelpful:

- Make it part of the contract and tests that Arena.Allocate() always
  return zeroed bytes.
- Make Single/Multi never allocate if passed buffers. This would
  effectively make them either read-only (when passed a buffer) or
  writable when initialized without one.
- Arenas should never have zero segments when assigned to a message.
  This should be part of the contract of Arenas and tested in Message.
- Message.Reset should not enforce allocation of the first word in an
  Arena. That should be done by the first call to Message.Root().
In the future, this will make it possible external implementations of
Segment to be written.

It also fixes the final data race in the refactor.
Copy link
Collaborator

@lthibault lthibault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lthibault lthibault merged commit ea11674 into capnproto:main Aug 30, 2024
4 checks passed
@matheusd matheusd deleted the refactor-arenas branch August 30, 2024 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants