-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort experiment #31
Draft
raphlinus
wants to merge
17
commits into
main
Choose a base branch
from
sort
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Sort experiment #31
Commits on Dec 24, 2023
-
Updates dependencies to latest published crates, including wgpu 0.18 and winit 0.28.
Configuration menu - View commit details
-
Copy full SHA for d928fee - Browse repository at this point
Copy the full SHA d928feeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 646199e - Browse repository at this point
Copy the full SHA 646199eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 43e632b - Browse repository at this point
Copy the full SHA 43e632bView commit details -
The shaders are mostly written, with some TODOs, but haven't been tested.
Configuration menu - View commit details
-
Copy full SHA for 9ae99d6 - Browse repository at this point
Copy the full SHA 9ae99d6View commit details -
Starting to wire up sizes, buffer bindings, etc., in preparation for actually running the pipeline.
Configuration menu - View commit details
-
Copy full SHA for 74eadb4 - Browse repository at this point
Copy the full SHA 74eadb4View commit details -
The count stage seems to be generating correct output. Next step is wiring up sums.
Configuration menu - View commit details
-
Copy full SHA for 968f52d - Browse repository at this point
Copy the full SHA 968f52dView commit details -
Configuration menu - View commit details
-
Copy full SHA for a833f11 - Browse repository at this point
Copy the full SHA a833f11View commit details -
Checkpoint working reduce and scan
Fix some sizing issues, seems to get to top-level scan correctly now.
Configuration menu - View commit details
-
Copy full SHA for 48957c3 - Browse repository at this point
Copy the full SHA 48957c3View commit details -
The prefix sum stages seem to be generating correct output.
Configuration menu - View commit details
-
Copy full SHA for 3a1c094 - Browse repository at this point
Copy the full SHA 3a1c094View commit details -
This seems to be a working scatter, which means that the core of the algorithm is done. Also just starting to look at performance characteristics. That's why there's a simpler count stage, the huge shared array seemed to be a problem.
Configuration menu - View commit details
-
Copy full SHA for 9d79a07 - Browse repository at this point
Copy the full SHA 9d79a07View commit details
Commits on Dec 26, 2023
-
Checkpoint almost working sort
The sort pipeline is wired up, and results are close to being sorted, but there are zero elements in the output.
Configuration menu - View commit details
-
Copy full SHA for 063e936 - Browse repository at this point
Copy the full SHA 063e936View commit details -
Checkpoint sorts medium sized arrays
This sorts up to 2^16, but fails at 2^17.
Configuration menu - View commit details
-
Copy full SHA for 8a920e7 - Browse repository at this point
Copy the full SHA 8a920e7View commit details
Commits on Dec 27, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 3eee575 - Browse repository at this point
Copy the full SHA 3eee575View commit details -
Multisplit appears to work in isolation, we'll see whether that holds up.
Configuration menu - View commit details
-
Copy full SHA for f10ddca - Browse repository at this point
Copy the full SHA f10ddcaView commit details -
It works (and doesn't seem to have the same problem as the scatter from Fidelity), but seems to be a bit slower than that. Perhaps that can be improved (subgroups would obviously help a lot), and it's also possible it would unlock going to 8 bits per pass.
Configuration menu - View commit details
-
Copy full SHA for 38de0be - Browse repository at this point
Copy the full SHA 38de0beView commit details -
Just iterate all the keys, it's faster. Also suggests a substantial fraction of all time is going into the ballot.
Configuration menu - View commit details
-
Copy full SHA for 00a3079 - Browse repository at this point
Copy the full SHA 00a3079View commit details -
We can use either 16 or 32 for warp size. The former is faster (on M1 Max). 8 is also a possibility but then the size of the histogram array would exceed the workgroup, so threads would need to deal with multiple histogram values. Quick experiments with ELEMENTS_PER_THREAD show no gains for values other than 4.
Configuration menu - View commit details
-
Copy full SHA for 9f882d8 - Browse repository at this point
Copy the full SHA 9f882d8View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.