The notes below show how to build selected applications with CUDA support enabled, and were provided by Adam DeConinck (NVIDIA Corporation).
GROMACS 4.6 - Build with OpenMPI 1.7rc5 and CUDA
================================================
http://www.gromacs.org/
Build commands
--------------
mkdir gromacs && cd gromacs
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-4.6.tar.gz
tar xzf gromacs-4.6.tar.gz
mkdir gromacs-build
module load cmake cuda gcc/4.6.3 fftw openmpi
CC=mpicc CXX=mpiCC cmake ./gromacs-4.6 -DGMX_OPENMP=ON -DGMX_MPI=ON \
-DGMX_PREFER_STATIC_LIBS=ON -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=./gromacs-build
make install
Other Parameters
----------------
cmake parameters:
* `CUDA_HOST_COMPILER` = which host compiler to use (ie which gcc)
* `CUDA_HOST_COMPILER_OPTIONS`
* `CUDA_NVCC_FLAGS`
Default value for `CUDA_NVCC_FLAGS` for Gromacs 4.6, from examining cmake
output after the fact:
./CMakeCache.txt:CUDA_NVCC_FLAGS:STRING=-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_30,code=compute_30;-use_fast_math;
So, for example, generating optimized code for K20 (`compute_35,sm_35`) would
require modifying `CUDA_NVCC_FLAGS`.
Running Gromacs Water benchmark
-------------------------------
Download the water benchmark
wget ftp://ftp.gromacs.org/pub/tmp/water-clean-input.tar.gz
tar xzf water-clean-input.tar.gz
For the water/1536 benchmark, initialize the topol.tpr file:
cd water/1536
~/gromacs/gromacs-build/bin/grompp_mpi -f pme.mdp
Run the benchmark using GPUs
mpirun -np $NP -hostfile $HOSTFILE ~/gromacs/gromacs-build/bin/mdrun_mpi \
-s topol.tpr -npme 0 -resethway -noconfout -nb gpu -nsteps 1000 -v
Run the benchmark using CPU only
mpirun -np $NP -hostfile $HOSTFILE ~/gromacs/gromacs-build/bin/mdrun_mpi \
-s topol.tpr -npme 0 -resethway -noconfout -nb cpu -nsteps 1000 -v
HOOMD-Blue 0.11.2
=================
<http://codeblue.umich.edu/hoomd-blue/index.html>
Build notes
-----------
wget http://codeblue.umich.edu/hoomd-blue/downloads/0.11/hoomd-0.11.2.tar.bz2
tar xjf hoomd-0.11.2.tar.bz2
cd hoomd-0.11.2
mkdir hoomd-build
module load cuda cmake
cmake . -DENABLE_CUDA=ON -DENABLE_OPENMP=ON -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=./hoomd-build -DCUDA_ARCH_LIST="20;30;35"
make install
Docs: <http://codeblue.umich.edu/hoomd-blue/doc/page_compile_guide.html>
LAMMPS with CUDA and OpenMPI - 6Dec12 version
=============================================
<http://lammps.sandia.gov/>
Build steps
-----------
Download LAMMPS
tar xzf lammps.tar.gz
export LAMMPS_DIR=`pwd`/lammps-6Dec12
First we'll build the GPU module.
Check that your cuda module sets `CUDA_HOME`, otherwise set it to your
CUDA directory (i.e. /usr/local/cuda). We will use arch `sm_35` for
Tesla K20.
cd $LAMMPS_DIR/lib/gpu
make -f Makefile.linux clean
module load cuda openmpi
make CUDA_HOME=$CUDA_HOME CUDA_ARCH="-arch=sm_35" \
CUDR_CPP=mpicxx CUDR_OPTS="-O2" -f Makefile.linux
Next we'll build the USER-CUDA module. This is just a different way of adding
CUDA support, and different LAMMPS models may use either one. Note that it has
a different method for specifying the `CUDA_HOME` and the build
architecture.
cd $LAMMPS_DIR/lib/cuda
make clean
make CUDA_INSTALL_PATH=$CUDA_HOME cufft=2 precision=2 arch=35
Now we'll set up all the LAMMPS modules correctly.
cd $LAMMPS_DIR/src
make clean-all
make yes-manybody
make yes-molecule
make yes-replica
make yes-kspace
make yes-asphere
make yes-gpu
make yes-user-cuda
If you want to list the modules to be used, run
make package-status
Set up your LAMMPS makefile and put it in the `$LAMMPS_DIR/src/MAKE` dir.
It needs to point correctly to your MPI and to
your FFTW if you're using one (here, we are). See Makefile.CUDA for an
example. Now we'll get set up and build.
module load fftw/2.1.5
make CUDA
After the build you should have `lmp_CUDA` in the src/ directory.
Makefile.CUDA
:
# openmpi = Fedora Core 6, mpic++, OpenMPI-1.1, FFTW2
SHELL = /bin/sh
# ---------------------------------------------------------------------
# compiler/linker settings
# specify flags and libraries needed for your compiler
CC = mpic++
CCFLAGS = -O2 \
-funroll-loops -fstrict-aliasing -Wall -W -Wno-uninitialized
SHFLAGS = -fPIC
DEPFLAGS = -M
LINK = mpic++
LINKFLAGS = -O
LIB = -lstdc++
SIZE = size
ARCHIVE = ar
ARFLAGS = -rcsv
SHLIBFLAGS = -shared
# ---------------------------------------------------------------------
# LAMMPS-specific settings
# specify settings for LAMMPS features you will use
# if you change any -D setting, do full re-compile after "make clean"
# LAMMPS ifdef settings, OPTIONAL
# see possible settings in doc/Section_start.html#2_2 (step 4)
LMP_INC = -DLAMMPS_GZIP
# MPI library, REQUIRED
# see discussion in doc/Section_start.html#2_2 (step 5)
# can point to dummy MPI library in src/STUBS as in Makefile.serial
# INC = path for mpi.h, MPI compiler settings
# PATH = path for MPI library
# LIB = name of MPI library
MPI_INC = -I$(MPI_HOME)/include
MPI_PATH =
MPI_LIB = -L$(MPI_HOME)/lib
# FFT library, OPTIONAL
# see discussion in doc/Section_start.html#2_2 (step 6)
# can be left blank to use provided KISS FFT library
# INC = -DFFT setting, e.g. -DFFT_FFTW, FFT compiler settings
# PATH = path for FFT library
# LIB = name of FFT library
FFT_INC = -DFFT_FFTW -I$(FFTLIB)/include
FFT_PATH =
FFT_LIB = -L$(FFTLIB)/lib -lfftw
# JPEG library, OPTIONAL
# see discussion in doc/Section_start.html#2_2 (step 7)
# only needed if -DLAMMPS_JPEG listed with LMP_INC
# INC = path for jpeglib.h
# PATH = path for JPEG library
# LIB = name of JPEG library
JPG_INC =
JPG_PATH =
JPG_LIB =
# ---------------------------------------------------------------------
# build rules and dependencies
# no need to edit this section
include Makefile.package.settings
include Makefile.package
EXTRA_INC = $(LMP_INC) $(PKG_INC) $(MPI_INC) $(FFT_INC) $(JPG_INC) $(PKG_SYSINC)
EXTRA_PATH = $(PKG_PATH) $(MPI_PATH) $(FFT_PATH) $(JPG_PATH) $(PKG_SYSPATH)
EXTRA_LIB = $(PKG_LIB) $(MPI_LIB) $(FFT_LIB) $(JPG_LIB) $(PKG_SYSLIB)
# Link target
$(EXE): $(OBJ)
$(LINK) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(EXTRA_LIB) $(LIB) -o $(EXE)
$(SIZE) $(EXE)
# Library targets
lib: $(OBJ)
$(ARCHIVE) $(ARFLAGS) $(EXE) $(OBJ)
shlib: $(OBJ)
$(CC) $(CCFLAGS) $(SHFLAGS) $(SHLIBFLAGS) $(EXTRA_PATH) -o $(EXE) \
$(OBJ) $(EXTRA_LIB) $(LIB)
# Compilation rules
%.o:%.cpp
$(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<
%.d:%.cpp
$(CC) $(CCFLAGS) $(EXTRA_INC) $(DEPFLAGS) $< > $@
# Individual dependencies
DEPENDS = $(OBJ:.o=.d)
sinclude $(DEPENDS)
OpenCV 2.4.4 with CUDA
======================
<http://opencv.org/>
Build notes
-----------
Download OpenCV from <http://opencv.org/downloads.html>
tar xjf OpenCV-2.4.4.tar.bz2
cd OpenCV-2.4.4
mkdir opencv-build
module load cmake cuda
cmake . -DWITH_CUDA=ON -DWITH_CUFFT=ON -DWITH_CUBLAS=ON -DHAVE_OPENMP=YES -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./opencv-build/
make install
And of course any other OpenCV options.
Quantum Espresso 4.3.2 with GPU support
=======================================
<http://www.quantum-espresso.org/>
Build steps
-----------
mkdir qe
wget http://qe-forge.org/gf/download/frsrelease/119/232/espresso-4.3.2-GPU.tar.gz
tar xzf espresso-4.3.2-GPU.tar.gz
cd espresso-4.3.2-GPU
We'll use the Intel compiler for QE. If your Intel is not installed in the
usual /opt/intel you will have to point to the correct MKL with `BLAS_LIBS`.
module load mpi/intel/13.0/openmpi/1.7 cuda intel/Compiler/13.0
./configure CC=icc CXX=icpc F77=ifort F90=ifort BLAS_LIBS=$MKLROOT \
--enable-cuda --enable-openmp --enable-parallel --with-cuda-dir=$CUDA_HOME
make clean
make pw
QUDA 0.4.0
==========
<http://lattice.github.com/quda/>
Build notes
-----------
wget http://github.com/downloads/lattice/quda/quda-0.4.0.tar.gz
tar xzf quda-0.4.0.tar.gz
cd quda-0.4.0
module load cuda openmpi
./configure --prefix=<install-path> --enable-multi-gpu \
--with-mpi=$MPI_HOME CC=mpicc CXX=mpiCC
make
make install
Building Charm++ 6.2.1
======================
wget http://charm.cs.uiuc.edu/distrib/charm-6.2.1_src.tar.gz
tar xzf charm-6.2.1_src.tar.gz
cd charm-6.2
Charm++ has a bunch of different pre-selectable configurations which can be
used, detailed on `http://charm.cs.uiuc.edu/manuals/html/charm++/A.html` . This
includes options for whether or not to use MPI, whether to include Infiniband
support, etc. Note that the `build` script may do some interactive question and
answer. In this case I'll build Charm++ with MPI enabled, but you should choose
your build process based on your cluster's needs.
module load openmpi
env MPICXX=mpicxx ./build charm++ mpi-linux-x86_64 --with-production
./build charm++ net-linux-x86_64 ibverbs -with-production -j8
Testing the build
cd mpi-linux-x86_64/tests/charm++/megatest
make pgm
mpirun -np 4 ./pgm
This build process is known to work on Adam's cluster @ NVIDIA, but may not be the best performing, etc; it's based in part on the broken instructions, and in part on Adam his work to get it to build.
In building NAMD, it's worth spending some time looking at the arguments
to the ./config
file in the source directory, and also looking at the
contents of the arch/
directory, for information as to what parameters are
available and useful. There's also some documentation on doing a build on
the NAMD homepage at
http://www.ks.uiuc.edu/Research/namd/2.9/notes.html#compiling , but it
appears to be somewhat out of date...
NAMD 2.9 with CUDA support
==========================
Download NAMD from http://www.ks.uiuc.edu/Research/namd/ (requires
registration).
tar xzf NAMD_2.9_Source.tar.gz
cd NAMD_2.9_Source
NAMD requires tcl, fftw 2.1.5 and Charm++. In this example, I am using
system tcl libraries and an fftw module. See the charm-6.2.markdown file for
details on building Charm++.
Edit `Make.charm` to set the CHARMBASE parameter to the install path of
Charm++.
Edit `arch/Linux-x86_64.fftw` to set the FFTDIR parameter to the install path
of FFTW. Note that your FFTW has to have been compiled with the
single-precision option enabled (--enable-float and
--enable-type-prefix) so you have the sfftw.h include files.
Edit `arch/Linux-x86_64.tcl` to set the TCLDIR parameter to the install path
of TCL. (In this case, using / , and making sure the libdir is correct, since
I'm using the system TCL.)
For modules, I'm using the same OpenMPI I used to build Charm++, and I'm using
the CUDA module.
module load cuda openmpi
./config Linux-x86_64-g++ --charm-arch mpi-linux-x86_64 --with-cuda --cuda-prefix $CUDA_HOME
cd Linux-x86_64-g++
make
This should produce a namd2 binary, as well as a charmrun binary and some other
files.