Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Barnes-Hut approximation #25

Merged
merged 79 commits into from
Mar 6, 2024
Merged

Introduce Barnes-Hut approximation #25

merged 79 commits into from
Mar 6, 2024

Conversation

axionbuster
Copy link
Owner

Toward an O(n log n) approximation scheme [where n is the number of particles].

Copy link

codecov bot commented Feb 18, 2024

Codecov Report

Attention: Patch coverage is 47.45763% with 31 lines in your changes are missing coverage. Please review.

Project coverage is 12.33%. Comparing base (c5c1916) to head (82067f1).
Report is 1 commits behind head on master.

Files Patch % Lines
tests/morton_test.cpp 23.33% 23 Missing ⚠️
dyn/newton.h 0.00% 3 Missing ⚠️
tests/circle_test.cpp 0.00% 3 Missing ⚠️
dyn/barnes_hut.h 95.45% 1 Missing ⚠️
dyn/halton.h 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #25      +/-   ##
==========================================
+ Coverage    8.42%   12.33%   +3.90%     
==========================================
  Files           8       10       +2     
  Lines         178      227      +49     
==========================================
+ Hits           15       28      +13     
- Misses        163      199      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vector::erase invalidates iterators at begin, no good.

Use list.

Noticeable perf improvement as a side effect.
Dim particles at cutoff (inverse-square used for style).

Screenshot.
Now, 59-60 FPS in the beginning.

How?

Stop recomputing everything. Bottleneck = equal_range, solve that by cutting number of particles if possible at every loop in main.
Take the idea of re-using groups a bit further.
Importantly: Don't copy state
Separate state vs. "view."

Algorithm only requires a view of particles.
Unreliable check for full bit pattern -> fixed.

No idea why was unreliable.
Sorting is the bottleneck. Can't go further (I think).

On MSVC, stable_sort is much faster than sort. I have no idea why.
Use a tree that is built in the beginning of the frame (considered needed for the gravity simulation to avoid duplicated work).
Identify allocation as a non-essential bottleneck
--> Re-use allocated memory
--> Replace list with vector
* WIP: draft out an interface

* WIP: 2

* WIP: 3

* Fix typing errors; make able to build

Has a defect, crashes.

* Fix a few tree-building defects

Other defects remain.

* Fix average finding routine

* WIP 4

* Fix averaging

* Fix sorting performance problem

* Make it work; comment the code

* Edit some comments in barnes_hut.h

* Make mask type general

* clang-tidy

* Fix build

* Remove tree3demo

I'll be making a change in the way grouping works

* Fix perf due to binary search; rid group() free function

Though not a regression, still was a problem.

In top-down approach, binary search, significant impact to perf.

In this bottom-up approach, on other hand, no need to binary search.

Measure latency improvement in first-time construction of groups.

* Optimize for latency sacrificing measured memory usage

Noticed many memmove calls, plus that emplace was always freeing and allocating new memory.

Know that a vector typically allocates memory in powers of two, or else in some sort of geometric sequence.

Well, reserve memory and then let construct in place, no more problem.

Got 60 FPS @ 50,000 particles.
* Table.h: Attempt to use Barnes-Hut in gravity simulation

* Make it "work" but degrade accuracy and speed

Latency in the case of 1,000 particles doubled on average

The demo in the beginning is breaking down

* Remove ref to the area rectangle-circle collision routine for viz

Incorrect routine

* Remove variable timing for eval of physics

Integrators not known to cope well with variable timing

* Raise particle count ceiling to 5,000

* Recycle memory for copy

* WIP: New implementation with an actual tree

Does not compile yet.

Preliminary idea.

Worried about shared pointer overhead.

But should reduce traversal overhead in `run`.

* Update barnes_hut.h

* WIP: Refactor barnes_hut.h

* Update barnes_hut.h

* Update barnes_hut.h

* Fix much, but hit stack overflow

Probably just use dumb pointers or a list.

* WIP: Convert to regular pointers

* WIP: Use a different LCRS approach

* WIP: Make able to compile

Crashes during tree construction, though

* WIP: Include unincluded headers

* WIP: Rename deleteGroup to delete_group

* WIP: Prevent immediate crash (MSVC)

* WIP: Fix null dereference by holding lower layer root (x) constant

* WIP

* WIP: Fix crashes (few-particle)

* Simplify

* WIP

* WIP

* WIP: two-particle case

No more crashes or memory leaks, but duplication problems.

* WIP dedup

* WIP

* Ignore clangd cache

* Change signature of run()

* WIP: New algorithm design

Make layers explicit

* Refactor. Still has aliasing problem, though.

* Compress things somewhat

* build -> parent

* Fix comment about prefixes

* Simplify constructor for B

* Solve aliasing problem

* Bring `explicit` back

* Make compile on MSVC

* Make it work for 400 particles

* Clean up somewhat

* Minor cleanup

* Find leak

* Add assert to find bug

* Remove memory leak

* Tweaks

* Make grass compile
@axionbuster axionbuster merged commit 19ab5c8 into master Mar 6, 2024
7 checks passed
@axionbuster axionbuster deleted the tree3 branch March 6, 2024 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant