Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Introduce Barnes-Hut approximation (#25)
* Demonstrate Morton code stuff (bad) * Improve Morton demo; test * Add colors and filling animation to Morton demo * Demonstrate four quadrants * Expand range to (-2, 2) x (-2, 2) Nothing seemed to change really. * quadrantdemo: Attach a screenshot * morton: Minor formatting * WIP: tree.h * Remove tree.h inherited from branch tree2 * Do square Morton code * Exclude INT32_MAX from integer conversion * Enable more warnings * CI: Update APT * Remove /WX options * Newton: Cut "c0 -= c0" * Halton: Remove unsigned in index and base Potential to participate in simpler arithmetic * Cut dead function morton32; add tests * WIP: Another tree... * Update tree.h * Update tree.h * tree.h: Attempt to improve documentation * tree.h: Attempt to improve documentation 2 * tree.h: First draft of the algorithm * tree.h: Add smoke test [fail] * Visualized masked Morton code boxes Now I know why tree3 tests fail. Some points that share a prefix with the others may lay outside the box created by the first and the last in the list that share the prefix. * quadrantdemo: Add circles, two centers. Getting the idea now. (See img2.jpeg) * WIP: Add algo, demo * tree.h: Appears to work. * tree3demo: Track mouse * tree3demo: Compute nodes only once * tree3demo: Decouple extra data * tree3demo: Apply movable mask * tree3demo: 60 FPS, resizable window, pan & zoom * tree3demo: Visualize angle rejection (left click) * tree3demo: Various improvements * tree3demo: Allow flight of particles * tree3demo: Print numbers of accepted nodes * tree3demo: Attach a screenshot * Delete /Testing * barnes_hut.h: WIP (dfs) Still figuring out * barnes_hut.h: WIP (dfs) 2 Still figuring out * barnes_hut.h: Remove Node structure Not necessary. * barnes_hut.h: Simplify; remove dfs() Dfs approach considered not necessary. * barnes_hut.h: WIP (run_level) I tried to translate instructions from my notes to code. Turns out, only parts of the instructions belong in barnes_hut.h because there's many assumptions that functions in the file can't possibly make (depend on). * barnes_hut.h: i -> g * barnes_hut.h: WIP (run_level) 2 I think I'm going to remove this function outright because it's almost nearly just a linear scan. * Update barnes_hut.h * tree3demo: Simplify radius computation * WIP: Add hierarchydemo Crashing because of attempt to erase erased particle * hierarchydemo: Fix erase() bug; add flight * hierarchydemo: Comply with iterator rules vector::erase invalidates iterators at begin, no good. Use list. Noticeable perf improvement as a side effect. * hierarchydemo: Various changes (and 10,000 particles) * hierarchydemo: Reset (R); tweak * hierarchydemo: Add comments * hierarchydemo: Add fog; use 50,000 particles, etc. Dim particles at cutoff (inverse-square used for style). Screenshot. * hierarchydemo: Reuse particles Now, 59-60 FPS in the beginning. How? Stop recomputing everything. Bottleneck = equal_range, solve that by cutting number of particles if possible at every loop in main. * hierarchydemo: Use vector for particles; const iterator Take the idea of re-using groups a bit further. * hierarchydemo: Simplify hot loop. Importantly: Don't copy state * hierarchydemo: Reorganize Separate state vs. "view." Algorithm only requires a view of particles. * hierarchydemo: Start from lowest level of detail * hierarchydemo: Fix `refine`; tweak Unreliable check for full bit pattern -> fixed. No idea why was unreliable. * hierarchydemo: Edit a couple comments * hierarchydemo: Use stable_sort instead of sort Sorting is the bottleneck. Can't go further (I think). On MSVC, stable_sort is much faster than sort. I have no idea why. * hierarchydemo: Precompute Morton; 100,000 particles. * clang-tidy * hierarchydemo: Implement experimental "copies" algorithm Use a tree that is built in the beginning of the frame (considered needed for the gravity simulation to avoid duplicated work). * hierarchydemo: Optimize memory usage; 50,000 particles Identify allocation as a non-essential bottleneck --> Re-use allocated memory --> Replace list with vector * clang-tidy * Move fixedmorton32 to barnes_hut.h * Rename fixedmorton32 to morton * Clean up code * Fix build * Factor out construction of the Barnes-Hut tree from the demo (#26) * WIP: draft out an interface * WIP: 2 * WIP: 3 * Fix typing errors; make able to build Has a defect, crashes. * Fix a few tree-building defects Other defects remain. * Fix average finding routine * WIP 4 * Fix averaging * Fix sorting performance problem * Make it work; comment the code * Edit some comments in barnes_hut.h * Make mask type general * clang-tidy * Fix build * Remove tree3demo I'll be making a change in the way grouping works * Fix perf due to binary search; rid group() free function Though not a regression, still was a problem. In top-down approach, binary search, significant impact to perf. In this bottom-up approach, on other hand, no need to binary search. Measure latency improvement in first-time construction of groups. * Optimize for latency sacrificing measured memory usage Noticed many memmove calls, plus that emplace was always freeing and allocating new memory. Know that a vector typically allocates memory in powers of two, or else in some sort of geometric sequence. Well, reserve memory and then let construct in place, no more problem. Got 60 FPS @ 50,000 particles. * demo/main.cpp: Put [[maybe_unused]] * Use Barnes-Hut in gravity simulation (#27) * Table.h: Attempt to use Barnes-Hut in gravity simulation * Make it "work" but degrade accuracy and speed Latency in the case of 1,000 particles doubled on average The demo in the beginning is breaking down * Remove ref to the area rectangle-circle collision routine for viz Incorrect routine * Remove variable timing for eval of physics Integrators not known to cope well with variable timing * Raise particle count ceiling to 5,000 * Recycle memory for copy * WIP: New implementation with an actual tree Does not compile yet. Preliminary idea. Worried about shared pointer overhead. But should reduce traversal overhead in `run`. * Update barnes_hut.h * WIP: Refactor barnes_hut.h * Update barnes_hut.h * Update barnes_hut.h * Fix much, but hit stack overflow Probably just use dumb pointers or a list. * WIP: Convert to regular pointers * WIP: Use a different LCRS approach * WIP: Make able to compile Crashes during tree construction, though * WIP: Include unincluded headers * WIP: Rename deleteGroup to delete_group * WIP: Prevent immediate crash (MSVC) * WIP: Fix null dereference by holding lower layer root (x) constant * WIP * WIP: Fix crashes (few-particle) * Simplify * WIP * WIP * WIP: two-particle case No more crashes or memory leaks, but duplication problems. * WIP dedup * WIP * Ignore clangd cache * Change signature of run() * WIP: New algorithm design Make layers explicit * Refactor. Still has aliasing problem, though. * Compress things somewhat * build -> parent * Fix comment about prefixes * Simplify constructor for B * Solve aliasing problem * Bring `explicit` back * Make compile on MSVC * Make it work for 400 particles * Clean up somewhat * Minor cleanup * Find leak * Add assert to find bug * Remove memory leak * Tweaks * Make grass compile * Tweak main demo * Compute CM properly * Add various improvements + complexity measurement
- Loading branch information