Introduce Barnes-Hut approximation #25

axionbuster · 2024-02-18T01:28:59Z

Toward an O(n log n) approximation scheme [where n is the number of particles].

Nothing seemed to change really.

codecov · 2024-02-18T03:25:16Z

Codecov Report

Attention: Patch coverage is 47.45763% with 31 lines in your changes are missing coverage. Please review.

Project coverage is 12.33%. Comparing base (c5c1916) to head (82067f1).
Report is 1 commits behind head on master.

Files	Patch %	Lines
tests/morton_test.cpp	23.33%	23 Missing ⚠️
dyn/newton.h	0.00%	3 Missing ⚠️
tests/circle_test.cpp	0.00%	3 Missing ⚠️
dyn/barnes_hut.h	95.45%	1 Missing ⚠️
dyn/halton.h	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #25      +/-   ##
==========================================
+ Coverage    8.42%   12.33%   +3.90%     
==========================================
  Files           8       10       +2     
  Lines         178      227      +49     
==========================================
+ Hits           15       28      +13     
- Misses        163      199      +36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Potential to participate in simpler arithmetic

Now I know why tree3 tests fail. Some points that share a prefix with the others may lay outside the box created by the first and the last in the list that share the prefix.

Getting the idea now. (See img2.jpeg)

vector::erase invalidates iterators at begin, no good. Use list. Noticeable perf improvement as a side effect.

Dim particles at cutoff (inverse-square used for style). Screenshot.

Now, 59-60 FPS in the beginning. How? Stop recomputing everything. Bottleneck = equal_range, solve that by cutting number of particles if possible at every loop in main.

Take the idea of re-using groups a bit further.

Importantly: Don't copy state

Separate state vs. "view." Algorithm only requires a view of particles.

Unreliable check for full bit pattern -> fixed. No idea why was unreliable.

Sorting is the bottleneck. Can't go further (I think). On MSVC, stable_sort is much faster than sort. I have no idea why.

Use a tree that is built in the beginning of the frame (considered needed for the gravity simulation to avoid duplicated work).

Identify allocation as a non-essential bottleneck --> Re-use allocated memory --> Replace list with vector

* WIP: draft out an interface * WIP: 2 * WIP: 3 * Fix typing errors; make able to build Has a defect, crashes. * Fix a few tree-building defects Other defects remain. * Fix average finding routine * WIP 4 * Fix averaging * Fix sorting performance problem * Make it work; comment the code * Edit some comments in barnes_hut.h * Make mask type general * clang-tidy * Fix build * Remove tree3demo I'll be making a change in the way grouping works * Fix perf due to binary search; rid group() free function Though not a regression, still was a problem. In top-down approach, binary search, significant impact to perf. In this bottom-up approach, on other hand, no need to binary search. Measure latency improvement in first-time construction of groups. * Optimize for latency sacrificing measured memory usage Noticed many memmove calls, plus that emplace was always freeing and allocating new memory. Know that a vector typically allocates memory in powers of two, or else in some sort of geometric sequence. Well, reserve memory and then let construct in place, no more problem. Got 60 FPS @ 50,000 particles.

* Table.h: Attempt to use Barnes-Hut in gravity simulation * Make it "work" but degrade accuracy and speed Latency in the case of 1,000 particles doubled on average The demo in the beginning is breaking down * Remove ref to the area rectangle-circle collision routine for viz Incorrect routine * Remove variable timing for eval of physics Integrators not known to cope well with variable timing * Raise particle count ceiling to 5,000 * Recycle memory for copy * WIP: New implementation with an actual tree Does not compile yet. Preliminary idea. Worried about shared pointer overhead. But should reduce traversal overhead in `run`. * Update barnes_hut.h * WIP: Refactor barnes_hut.h * Update barnes_hut.h * Update barnes_hut.h * Fix much, but hit stack overflow Probably just use dumb pointers or a list. * WIP: Convert to regular pointers * WIP: Use a different LCRS approach * WIP: Make able to compile Crashes during tree construction, though * WIP: Include unincluded headers * WIP: Rename deleteGroup to delete_group * WIP: Prevent immediate crash (MSVC) * WIP: Fix null dereference by holding lower layer root (x) constant * WIP * WIP: Fix crashes (few-particle) * Simplify * WIP * WIP * WIP: two-particle case No more crashes or memory leaks, but duplication problems. * WIP dedup * WIP * Ignore clangd cache * Change signature of run() * WIP: New algorithm design Make layers explicit * Refactor. Still has aliasing problem, though. * Compress things somewhat * build -> parent * Fix comment about prefixes * Simplify constructor for B * Solve aliasing problem * Bring `explicit` back * Make compile on MSVC * Make it work for 400 particles * Clean up somewhat * Minor cleanup * Find leak * Add assert to find bug * Remove memory leak * Tweaks * Make grass compile

axionbuster added 14 commits February 16, 2024 11:24

Demonstrate Morton code stuff (bad)

1d8e448

Improve Morton demo; test

d458445

Add colors and filling animation to Morton demo

a944862

Demonstrate four quadrants

554e312

Expand range to (-2, 2) x (-2, 2)

67922ea

Nothing seemed to change really.

quadrantdemo: Attach a screenshot

236542d

morton: Minor formatting

99d8e09

WIP: tree.h

84cf644

Remove tree.h inherited from branch tree2

2f390ea

Do square Morton code

033191e

Exclude INT32_MAX from integer conversion

0c2f53e

Enable more warnings

1e339e0

CI: Update APT

63a96b0

Remove /WX options

83a02b0

axionbuster added 15 commits February 17, 2024 19:45

Newton: Cut "c0 -= c0"

d3bf96f

Halton: Remove unsigned in index and base

30659d1

Potential to participate in simpler arithmetic

Cut dead function morton32; add tests

540690c

WIP: Another tree...

f003a54

Update tree.h

1b07b08

Update tree.h

40a8d7e

tree.h: Attempt to improve documentation

8d30a81

tree.h: Attempt to improve documentation 2

d9f096d

tree.h: First draft of the algorithm

f04486c

tree.h: Add smoke test [fail]

cdc9e3c

Visualized masked Morton code boxes

093c951

Now I know why tree3 tests fail. Some points that share a prefix with the others may lay outside the box created by the first and the last in the list that share the prefix.

quadrantdemo: Add circles, two centers.

90efa6e

Getting the idea now. (See img2.jpeg)

WIP: Add algo, demo

f944716

tree.h: Appears to work.

3b4ddf2

tree3demo: Track mouse

517102e

axionbuster added 28 commits February 21, 2024 15:22

hierarchydemo: Comply with iterator rules

02f2181

vector::erase invalidates iterators at begin, no good. Use list. Noticeable perf improvement as a side effect.

hierarchydemo: Various changes (and 10,000 particles)

30c0d1a

hierarchydemo: Reset (R); tweak

fd1afc0

hierarchydemo: Add comments

5471e3d

hierarchydemo: Add fog; use 50,000 particles, etc.

eb15ccf

Dim particles at cutoff (inverse-square used for style). Screenshot.

hierarchydemo: Reuse particles

5565a7d

Now, 59-60 FPS in the beginning. How? Stop recomputing everything. Bottleneck = equal_range, solve that by cutting number of particles if possible at every loop in main.

hierarchydemo: Use vector for particles; const iterator

268637f

Take the idea of re-using groups a bit further.

hierarchydemo: Simplify hot loop.

7098d38

Importantly: Don't copy state

hierarchydemo: Reorganize

9e40033

Separate state vs. "view." Algorithm only requires a view of particles.

hierarchydemo: Start from lowest level of detail

0af8a5a

hierarchydemo: Fix refine; tweak

73db81b

Unreliable check for full bit pattern -> fixed. No idea why was unreliable.

hierarchydemo: Edit a couple comments

c95e2cf

hierarchydemo: Use stable_sort instead of sort

ea0f300

Sorting is the bottleneck. Can't go further (I think). On MSVC, stable_sort is much faster than sort. I have no idea why.

hierarchydemo: Precompute Morton; 100,000 particles.

08ba0cf

clang-tidy

02aeb27

hierarchydemo: Implement experimental "copies" algorithm

047ff6f

Use a tree that is built in the beginning of the frame (considered needed for the gravity simulation to avoid duplicated work).

hierarchydemo: Optimize memory usage; 50,000 particles

e7e61c2

Identify allocation as a non-essential bottleneck --> Re-use allocated memory --> Replace list with vector

clang-tidy

1346c96

Move fixedmorton32 to barnes_hut.h

c03db19

Rename fixedmorton32 to morton

9839184

Clean up code

2c3dc49

Fix build

c9c1b00

demo/main.cpp: Put [[maybe_unused]]

d4fa5c5

Tweak main demo

8da4d6b

Compute CM properly

7f464f5

Add various improvements + complexity measurement

82067f1

axionbuster merged commit 19ab5c8 into master Mar 6, 2024
7 checks passed

axionbuster deleted the tree3 branch March 6, 2024 05:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Barnes-Hut approximation #25

Introduce Barnes-Hut approximation #25

axionbuster commented Feb 18, 2024

codecov bot commented Feb 18, 2024 •

edited

Loading

Introduce Barnes-Hut approximation #25

Introduce Barnes-Hut approximation #25

Conversation

axionbuster commented Feb 18, 2024

codecov bot commented Feb 18, 2024 • edited Loading

Codecov Report

codecov bot commented Feb 18, 2024 •

edited

Loading