Add support for forcing the GC when using Distributed #19
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds support for forcing Julia's garbage collector to run during certain operations when using the Distributed backend.
@nbaldelli was reporting that Julia was crashing when using the Distributed backend of ITensorParallel.jl with larger MPS bond dimensions. It may be related to issues with Julia not running the garbage collector properly when using Distributed parallelization, for example see https://discourse.julialang.org/t/from-multithreading-to-distributed/101984/6.
Currently by default it triggers the garbage collector to run within remote calls to applying the effective Hamiltonian, changing the position, etc. when there is less than 6GB of memory left on the process. This default can be changed with:
where the value is in GB. Setting it very large (i.e.
ITensorParallel.set_gc_gb_threshold!(Inf)
) means it always gets triggered in the operations where it is hard-coded in this PR, while setting it very small (i.e.ITensorParallel.set_gc_gb_threshold!(0)
) means Julia will handle all garbage collection based on its default behavior.@b-kloss @awietek