Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for forcing the GC when using Distributed #19

Merged
merged 5 commits into from
Aug 2, 2023
Merged

Conversation

mtfishman
Copy link
Member

@mtfishman mtfishman commented Aug 1, 2023

This adds support for forcing Julia's garbage collector to run during certain operations when using the Distributed backend.

@nbaldelli was reporting that Julia was crashing when using the Distributed backend of ITensorParallel.jl with larger MPS bond dimensions. It may be related to issues with Julia not running the garbage collector properly when using Distributed parallelization, for example see https://discourse.julialang.org/t/from-multithreading-to-distributed/101984/6.

Currently by default it triggers the garbage collector to run within remote calls to applying the effective Hamiltonian, changing the position, etc. when there is less than 6GB of memory left on the process. This default can be changed with:

ITensorParallel.set_gc_gb_threshold!(3)

where the value is in GB. Setting it very large (i.e. ITensorParallel.set_gc_gb_threshold!(Inf)) means it always gets triggered in the operations where it is hard-coded in this PR, while setting it very small (i.e. ITensorParallel.set_gc_gb_threshold!(0)) means Julia will handle all garbage collection based on its default behavior.

@b-kloss @awietek

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2023

Codecov Report

Merging #19 (070c530) into main (79215c4) will increase coverage by 2.45%.
Report is 3 commits behind head on main.
The diff coverage is 85.93%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main      #19      +/-   ##
==========================================
+ Coverage   67.97%   70.42%   +2.45%     
==========================================
  Files           9       10       +1     
  Lines         281      328      +47     
==========================================
+ Hits          191      231      +40     
- Misses         90       97       +7     
Files Changed Coverage Δ
src/ITensorParallel.jl 100.00% <ø> (ø)
src/mpisumterm.jl 81.57% <0.00%> (ø)
src/force_gc.jl 66.66% <66.66%> (ø)
src/foldssum.jl 58.18% <81.81%> (+10.95%) ⬆️
src/distributedsum.jl 97.14% <96.87%> (+3.39%) ⬆️

@mtfishman mtfishman merged commit be76312 into main Aug 2, 2023
9 checks passed
@mtfishman mtfishman deleted the force_gc branch August 2, 2023 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants