-
Hi everyone, I'm currently attempting to utilize shape_optimization.py for the 3D inviscid onera tutorial with discrete adjoint. While performing this optimization on my local host, everything runs smoothly without any issues. However, when I attempt to run it on the nodes of HPC cluster, I encounter occasional errors. The error seems to occur randomly, sometimes during the DEFORM process, and other times during the ADJOINT or DIRECT processes. Additionally, I'm using version 740 and openmpi 414. This inconsistency error is seems like related to MPI, and I'm seeking some insights into potential reasons for this behavior on the HPC nodes. Has anyone else experienced a similar issue or have any ideas on what could be causing this problem? Thanks in advance for your help! OS Job file error: Primary job terminated normally, but 1 process returned mpirun noticed that process rank 13 with PID 24118 on node compute-5-2 exited on signal 11 (Segmentation fault)._ |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I managed to solve the error on my own. The problem was resolved when I made the following changes to the MPI parameters. mpirun --mca btl_openib_cpc_exclude rdmacm --mca mpi_leave_pinned 0 --mca btl_openib_allow_ib 1 --mca btl openib,self,vader -n $NSLOTS |
Beta Was this translation helpful? Give feedback.
I managed to solve the error on my own. The problem was resolved when I made the following changes to the MPI parameters.
mpirun --mca btl_openib_cpc_exclude rdmacm --mca mpi_leave_pinned 0 --mca btl_openib_allow_ib 1 --mca btl openib,self,vader -n $NSLOTS