Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build instructions for Bitterroot #932

Open
aprilnovak opened this issue Jul 15, 2024 · 8 comments
Open

Add build instructions for Bitterroot #932

aprilnovak opened this issue Jul 15, 2024 · 8 comments
Assignees

Comments

@aprilnovak
Copy link
Collaborator

Reason

New machine coming to INL, let's make sure we know how to build Cardinal on it.

Design

Add Bitterroot as a system to Cardinal's HPC documents.

Impact

Better user experience.

@lewisgross1296
Copy link
Contributor

lewisgross1296 commented Jul 23, 2024

I was able to ssh into Bitterrot, but upon opening my terminal, the ~/.bashrc I have for Sawtooth complained

Lmod has detected the following error:  The following module(s) are unknown: "openmpi/4.1.6-gcc-12.3.0-panw"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "openmpi/4.1.6-gcc-12.3.0-panw"

Also make sure that all modulefiles written in TCL start with the string #%Module

Lmod has detected the following error:  The following module(s) are unknown: "cmake/3.27.7-gcc-12.3.0-5cfk"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "cmake/3.27.7-gcc-12.3.0-5cfk"

Also make sure that all modulefiles written in TCL start with the string #%Module

Lmod has detected the following error:  The following module(s) are unknown: "gcc/12.3.0-gcc-10.5.0-vx2f"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "gcc/12.3.0-gcc-10.5.0-vx2f"

Also make sure that all modulefiles written in TCL start with the string #%Module

seems like those modules exist when I module avail, but perhaps the syntax is causing an issue. Maybe I should remove this from my ~/.bashrc

###################### CARDINAL ENVIRONMENT ######################
module purge
module load use.moose
module load moose-tools
module load openmpi/4.1.6-gcc-12.3.0-panw
module load cmake/3.27.7-gcc-12.3.0-5cfk
module load gcc/12.3.0-gcc-10.5.0-vx2f # needed for NekRS

Might just be better to load them in the terminal when building Cardinal? I do get complaints that these modules don't exist when I scp or log onto inlhpclogin, but I just ignore the messages.

@aprilnovak
Copy link
Collaborator Author

You can have if statements in your bashrc which can tell which system you're on. It's a similar setup at OLCF. Here's what we do for Frontier vs. Summit, maybe a similar syntax will work on Bitteroot/Sawtooth.

if [ $LMOD_SYSTEM_NAME = frontier ]; then
    module purge
    module load PrgEnv-gnu craype-accel-amd-gfx90a cray-mpich rocm cray-python/3.9.13.1 cmake/3.21.3
    module unload cray-libsci

    # Revise for your Cardinal repository location
    DIRECTORY_WHERE_YOU_HAVE_CARDINAL=$HOME/frontier
    cd $DIRECTORY_WHERE_YOU_HAVE_CARDINAL

    HOME_DIRECTORY_SYM_LINK=$(realpath -P $DIRECTORY_WHERE_YOU_HAVE_CARDINAL)
    export NEKRS_HOME=$HOME_DIRECTORY_SYM_LINK/cardinal/install

    export OPENMC_CROSS_SECTIONS=/lustre/orion/fus166/proj-shared/novak/cross_sections/endfb-vii.1-hdf5/cross_sections.xml
fi

@lewisgross1296
Copy link
Contributor

lewisgross1296 commented Jul 23, 2024

Thanks to @loganharbour this submit script using module load cardinal-mpich worked for me on bitterroot. It's running a pretty hefty job quickly. Maybe his apptainer knowledge could be useful for more detailed build from source info

#!/bin/sh
#This file is called submit-script.sh
#SBATCH --partition=general      # default general (option short or hbm)
#SBATCH --time=7-00:00:00        # run time in days-hh:mm:ss (6 hours is the max for short)
#SBATCH --nodes=32               # number of job nodes (max is 168 nodes on general, 336 nodes on short)
#SBATCH --ntasks-per-node=1      # mpi ranks per node
#SBATCH --cpus-per-task=112      # threads per mpi rank
#SBATCH --wckey=moose            # project code
#SBATCH --error=small_inf_assembly.err.%J
#SBATCH --output=small_inf_assembly.txt.%J


module purge
module load use.moose moose-containers cardinal-mpich

JOB_DIR=/home/groslewi/gcmr/mwes/25kp_dt1e-2_small_inf_assembly

export MV2_USE_ALIGNED_ALLOC=1
export MV2_THREADS_PER_PROCESS=${SLURM_CPUS_PER_TASK}
mpiexec cardinal-opt -i ${JOB_DIR}/openmc.i --n-threads=${SLURM_CPUS_PER_TASK}

@aprilnovak
Copy link
Collaborator Author

Does cardinal-mpich include NekRS in it?

@loganharbour
Copy link
Member

Does cardinal-mpich include NekRS in it?

It does. It's the base for what's being used for docker: openmc, dagmc, nekrs

@lewisgross1296
Copy link
Contributor

@AyaHegazy22 and I chatted a bit and she was unable to recreate the success. This makes sense though, as we discovered that I was also only able to run the job on some select nodes. Thanks to Logan, it should work on every node now.

I just launched a job that is running. Aya, if you get a chance try again. Here's my working submit script. (has a few better defaults for #SBATCH)

#!/bin/sh
#This file is called submit-script.sh
#SBATCH --partition=general      # default general (option short or hbm)
#SBATCH --time=0-06:00:00        # run time in days-hh:mm:ss (6 hours is the max for short)
#SBATCH --nodes=24               # number of job nodes (max is 168 nodes on general, 336 nodes on short)
#SBATCH --ntasks-per-node=1      # mpi ranks per node
#SBATCH --cpus-per-task=112      # threads per mpi rank
#SBATCH --wckey=moose            # project code
#SBATCH --error=small_inf_assembly.err.%J
#SBATCH --output=small_inf_assembly.txt.%J


module purge
module load use.moose moose-containers cardinal-mpich/2024.07.12-b44370a
JOB_DIR=/home/groslewi/gcmr/mwes/small_inf_assembly

export MV2_USE_ALIGNED_ALLOC=1
export MV2_THREADS_PER_PROCESS=${SLURM_CPUS_PER_TASK}
mpiexec cardinal-opt -i ${JOB_DIR}/openmc.i --n-threads=${SLURM_CPUS_PER_TASK}

@meltawila
Copy link
Contributor

@lewisgross1296 is there any update on this? with the above it looks like you were still using the pre-built Cardinal only, right?

@lewisgross1296
Copy link
Contributor

I have not tried to build from source on Bitterroot, since it seems that the suggested way is to use the Apptainer provided. The container has worked pretty well so far tho.

I have yet to try a Nek case, so can't confirm behavior there.If @loganharbour is able to share the Apptainer build script, that might be useful for others trying to build from source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants