supercomputers deployment with unidist MPI backend #6073
-
hi all! I read modin's doc and do not find any example on how to start Now I am using Intel MPI which can support spawning MPI processes without using What I do:
Now I get the following error message:
It will be helpful if there are official |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 17 replies
-
@luweizheng thank you for creating this issue. Unfortunately I know very little about MPI and unidist. Is the documentation here helpful at all? The error messages seem to say that unidist was unable to set up the remote workers. Would it help to customize the worker hosts here? Have you checked whether your controller node is able to communicate with the workers? cc'ing @YarShev @modin-project/unidist-core for help. I think they can be more helpful. |
Beta Was this translation helpful? Give feedback.
-
@luweizheng could you, please, provide more details about you use-case. You are starting one process application and then spawn the processes with Intel MPI? What |
Beta Was this translation helpful? Give feedback.
Now I can run on unidist MPI backend successfully!
My SLURM script:
remove all
ray
code and addimport unidist;unidist.init()l
intoxxx.py
:I use Intel MPI and Intel MPI can automatically detect hosts by reading the
$SLURM_JOB_NODELIST
from SLURM.Maybe we should update the docs? I can help to contribute. Where should I put these things into the docs?