Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#4574: Warn users when pre-initialized Ray cluster is not using all available memory #4575

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/release_notes/release_notes-0.16.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Key Features and Updates
* XGBoost enhancements
*
* Developer API enhancements
*
* FEAT-#4574: Warn users when pre-initialized Ray cluster is not using all available memory (#4575)
* Update testing suite
* TEST-#4508: Reduce test_partition_api pytest threads to deflake it (#4551)
* TEST-#4550: Use much less data in test_partition_api (#4554)
Expand All @@ -34,3 +34,4 @@ Contributors
------------
@mvashishtha
@prutskov
@RehanSD
18 changes: 18 additions & 0 deletions modin/core/execution/ray/common/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,24 @@ def initialize_ray(
if not GPU_MANAGERS:
for i in range(GpuCount.get()):
GPU_MANAGERS.append(GPUManager.remote(i))
else:
ray_obj_store_mem = ray.available_resources()["object_store_memory"]
system_memory = psutil.virtual_memory().total
if sys.platform.startswith("linux"):
shm_fd = os.open("/dev/shm", os.O_RDONLY)
try:
shm_stats = os.fstatvfs(shm_fd)
system_memory = shm_stats.f_bsize * shm_stats.f_bavail
finally:
os.close(shm_fd)
if (ray_obj_store_mem // 1e9) < (0.6 * system_memory) // 1e9:
warnings.warn(
"Modin has detected that it is running on a pre-initialized Ray cluster. "
+ f"This cluster has currently allocated {ray_obj_store_mem // 1e9} GB for its "
+ f"object store, but the device has {system_memory // 1e9} GB of RAM available. "
+ "Modin recommends initializing Ray with at least 60% of available RAM to prevent "
+ "Out Of Memory errors."
)
_move_stdlib_ahead_of_site_packages()
ray.worker.global_worker.run_function_on_all_workers(
_move_stdlib_ahead_of_site_packages
Expand Down