Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[encoding] Initial attempt at a BumpEstimator utility #436

Merged
merged 1 commit into from
Mar 12, 2024
Merged

Conversation

armansito
Copy link
Collaborator

Several vello stages dynamically bump allocate intermediate data structures. Due to graphics API limitations the backing memory for these data structures must have been allocated at the time of command submission even though the precise memory requirements are unknown.

Vello currently works around this issue in two ways (see #366):

  1. It prescribes a mechanism in which allocation failures get detected by fencing back to the CPU. The client responds to this event by creating larger GPU buffers using the bump allocator state obtained via read-back. The client has the choice of dropping a frame or submitting the fine stage only after any allocation failures have been resolved.

  2. The encoding crate hard-codes the buffers to be large enough to be able to render paris-30k, making it unlikely for simple scenes to under-allocate. This comes at the cost of a fixed memory watermark of >50MB.

There may be situations when neither of these solutions are desirable while the cost of additional CPU-side pre-processing is not considered prohibitive for performance. It may also be acceptable to pay the cost of generally allocating more than what's required in order to make underallocation impossible (except perhaps for OOM situations).

In that spirit, this change introduces the beginnings of a heuristic-based conservative memory estimation utility. It currently estimates only the LineSoup buffer (which contains the curve flattening output) within a factor of 1.1x-3.3x on the Vello test scenes (paris-30k is estimated at 1.5x the actual requirement).

  • Curves are estimated using Wang's formula which is fast to evaluate but produces a less optimal result than Vello's analytic approach. The overestimation is more pronounced with increased curvature variation.

  • Explicit lines (such as line-tos) get estimated precisely.

  • As an initial stage, only the LineSoup buffer is supported. Support for the other buffers will be added as follow-up work, as they require experiments with additional heuristics.

  • A BumpEstimator is integrated with the Scene API (gated by a feature flag) but the results are currently unused. Glyph runs are not supported as the estimator is not yet aware of the path data stored in the glyph cache. Transformed scene fragments are supported by applying fine-grained scaling to curve line counts, skipping explicit lines which are scale-invariant.

Copy link
Contributor

@raphlinus raphlinus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can go in, minor performance tweak and cleanup suggested. If we weren't planning on iterating toward putting the estimation at resolve time, I'd ask for CI changes, but if this is a reasonably temporary state, I'm ok with it not going in. Just don't be surprised if the feature has some breakage :)

# Enables GPU memory usage estimation. This performs additional computations
# in order to estimate the minimum required allocations for buffers backing
# bump-allocated GPU memory.
bump_estimate = ["vello_encoding/bump_estimate"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we now have more possibilities for things to break, one of the things I'd like to see is running at least cargo check in CI with the feature enabled and disabled.

However, since the longer term plan is most likely to move estimation to resolve time, which hopefully will mean that the cost of estimation will be a runtime rather than a compile time choice and so won't need a feature gate, I'm not going to ask for the CI changes now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI already runs:

  • --no-default-features
  • default features
  • --all-features

src/scene.rs Outdated Show resolved Hide resolved
src/scene.rs Outdated Show resolved Hide resolved
@armansito armansito force-pushed the bump-estimate branch 2 times, most recently from 3f56e51 to 7bfca34 Compare March 12, 2024 19:47
Several vello stages dynamically bump allocate intermediate data
structures. Due to graphics API limitations the backing memory
for these data structures must have been allocated at the time of
command submission even though the precise memory requirements
are unknown.

Vello currently works around this issue in two ways (see #366):
1. Vello currently prescribes a mechanism in which allocation failures
   get detected by fencing back to the CPU. The client responds to
   this event by creating larger GPU buffers using the bump
   allocator state obtained via read-back. The client has the
   choice of dropping skipping a frame or submitting the fine
   stage only after any allocations failures get resolved.

2. The encoding crate hard-codes the buffers to be large enough to be
   able to render paris-30k, making it unlikely for simple scenes to
   under-allocate. This comes at the cost of a fixed memory watermark
   of >50MB.

There may be situations when neither of these solutions are desirable
while the cost of additional CPU-side pre-processing is not considered
prohibitive for performance. It may also be acceptable to pay the cost
of generally allocating more than what's required in order to make the
this problem go away entirely (except perhaps for OOM situations).

In that spirit, this change introduces the beginnings of a
heuristic-based conservative memory estimation utility. It currently
estimates only the LineSoup buffer (which contains the curve flattening
output) within a factor of 1.1x-3.3x on the Vello test scenes (paris-30k
is estimated at 1.5x the actual requirement).

- Curves are estimated using Wang's formula which is fast to evaluate
  but produces a less optimal result than Vello's analytic approach.
  The overestimation is more pronounced with increased curvature
  variation.

- Explicit lines (such as line-tos) get estimated precisely

- Only the LineSoup buffer is supported.

- A BumpEstimator is integrated with the Scene API (gated by a feature
  flag) but the results are currently unused. Glyph runs are not
  supported as the estimator is not yet aware of the path data stored
  in glyph cache.
@armansito armansito added this pull request to the merge queue Mar 12, 2024
Merged via the queue into main with commit f55f82f Mar 12, 2024
9 checks passed
@armansito armansito deleted the bump-estimate branch March 12, 2024 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants