Skip to content

Commit

Permalink
Toochain update: new download url and MPICH/OpenMPI version (#5088)
Browse files Browse the repository at this point in the history
* Add url download function

* add -O option for wget

* using url downloader in stage4

* update mpich version

* Update README.md

* disable rapidjson by default

related to #5069

* add information for abacus_env.sh

* modify info of abacus_env.sh

* add information for gcc toolchain

* update OpenMPI version to 5.0.5

* Update README.md

* change the download url for openmpi

* Update README.md

* fix link of openmpi

* fix openmpi url

* Update install_libtorch.sh

* update date of update

---------

Co-authored-by: kirk0830 <[email protected]>
  • Loading branch information
QuantumMisaka and kirk0830 authored Sep 14, 2024
1 parent 03c2582 commit c8c0a10
Show file tree
Hide file tree
Showing 14 changed files with 174 additions and 89 deletions.
86 changes: 57 additions & 29 deletions toolchain/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# The ABACUS Toolchain

Version 2024.2

## Author

[QuantumMisaka](https://github.com/QuantumMisaka)
(Zhaoqing Liu) @PKU @AISI

Expand All @@ -25,7 +27,7 @@ and give setup files that you can use to compile ABACUS.
- [x] Support for [LibRI](https://github.com/abacusmodeling/LibRI) by submodule or automatic installation from github.com (but installed LibRI via `wget` seems to have some problem, please be cautious)
- [x] A mirror station by Bohrium database, which can download CEREAL, LibNPY, LibRI and LibComm by `wget` in China Internet.
- [x] Support for GPU compilation, users can add `-DUSE_CUDA=1` in builder scripts.
- [ ] A better mirror station for all packages.
- [ ] Change the downloading url from cp2k mirror to other mirror or directly downloading from official website. (doing)
- [ ] A better README and Detail markdown file.
- [ ] Automatic installation of [DEEPMD](https://github.com/deepmodeling/deepmd-kit).
- [ ] Better compliation method for ABACUS-DEEPMD and ABACUS-DEEPKS.
Expand All @@ -35,11 +37,11 @@ and give setup files that you can use to compile ABACUS.


## Usage Online & Offline

Main script is *install_abacus_toolchain.sh*,
which will use scripts in *scripts* directory
to compile install dependencies of ABACUS.

You can just `./install_abacus_toolchain.sh -h` to get more help message.
It can be directly used, but not recommended.

There are also well-modified script to run *install_abacus_toolchain.sh* for `gnu-openblas` and `intel-mkl` toolchains dependencies.

Expand All @@ -51,6 +53,7 @@ There are also well-modified script to run *install_abacus_toolchain.sh* for `gn
# for intel-mkl-mpich
> ./toolchain_intel-mpich.sh
```

It is recommended to run them first to get a fast installation of ABACUS under certain environments.

If you have a fresh environments and you have `sudo` permission, you can use *install_requirements.sh* to install system libraries and dependencies needed by toolchain.
Expand All @@ -66,6 +69,7 @@ If you have a fresh environments and you have `sudo` permission, you can use *in
```

All packages will be downloaded from [cp2k-static/download](https://www.cp2k.org/static/downloads). by `wget` , and will be detailedly compiled and installed in `install` directory by toolchain scripts, despite of:

- `CEREAL` which will be downloaded from [CEREAL](https://github.com/USCiLab/cereal)
- `Libnpy` which will be downloaded from [LIBNPY](https://github.com/llohse/libnpy)
- `LibRI` which will be downloaded from [LibRI](https://github.com/abacusmodeling/LibRI)
Expand All @@ -77,12 +81,17 @@ Instead of github.com, we offer other package station, you can use it by:
```shell
wget https://bohrium-api.dp.tech/ds-dl/abacus-deps-93wi-v3 -O abacus-deps-v3.zip
```
`unzip` it ,and you can do offline installation of these packages above after rename. The above station will be updated handly but one should notice that the version will always lower than github repo.
`unzip` it ,and you can do offline installation of these packages above after rename.
```shell
# packages downloaded from github.com
mv v1.3.2.tar.gz build/cereal-1.3.2.tar.gz
```
The above station will be updated handly but one should notice that the version will always lower than github repo.

If one want to install ABACUS by toolchain OFFLINE,
one can manually download all the packages from [cp2k-static/download](https://www.cp2k.org/static/downloads) or official website
and put them in *build* directory by formatted name
like *fftw-3.3.10.tar.gz*, or *openmpi-5.0.3.tar.gz*,
like *fftw-3.3.10.tar.gz*, or *openmpi-5.0.5.tar.bz2*,
then run this toolchain.
All package will be detected and installed automatically.
Also, one can install parts of packages OFFLINE and parts of packages ONLINE
Expand All @@ -96,10 +105,11 @@ just by using this toolchain
```

The needed dependencies version default:

- `cmake` 3.30.0
- `gcc` 13.2.0 (which will always NOT be installed, But use system)
- `OpenMPI` 5.0.3
- `MPICH` 4.1.2
- `OpenMPI` 5.0.5
- `MPICH` 4.2.2
- `OpenBLAS` 0.3.27 (Intel toolchain need `get_vars.sh` tool from it)
- `ScaLAPACK` 2.2.1
- `FFTW` 3.3.10
Expand All @@ -111,28 +121,21 @@ And Intel-oneAPI need user or server manager to manually install from Intel.
[Intel-oneAPI](https://www.intel.cn/content/www/cn/zh/developer/tools/oneapi/toolkits.html)

Dependencies below are optional, which is NOT installed by default:

- `LibTorch` 2.1.2
- `Libnpy` 1.0.1
- `LibRI` 0.2.0
- `LibComm` 0.1.1
Users can install them by using `--with-*=install` in toolchain*.sh, which is `no` in default.
> Notice: LibRI, LibComm and Libnpy is on actively development, you should check-out the package version when using this toolchain. Also, LibRI and LibComm can be installed by github submodule, which is also work for libnpy, which is more recommended.

Notice: for `CEREAL`,`RapidJSON`, `Libnpy`, `LibRI` and `LibComm`,
you need to download them from github.com,
rename it as formatted, and put them in `build` directory at the same time
e.g.:
```shell
# packages downloaded from github.com
mv v1.3.2.tar.gz build/cereal-1.3.2.tar.gz
```
Users can install them by using `--with-*=install` in toolchain*.sh, which is `no` in default.
> Notice: LibRI, LibComm and Libnpy is on actively development, you should check-out the package version when using this toolchain. Also, LibRI and LibComm can be installed by github submodule, that is also work for libnpy, which is more recommended.
Users can easily compile and install dependencies of ABACUS
by running these scripts after loading `gcc` or `intel-mkl-mpi`
environment.

The toolchain installation process can be interrupted at anytime.
just re-run *install_abacus_toolchain.sh*, toolchain itself may fix it
just re-run *toolchain_\*.sh*, toolchain itself may fix it

If compliation is successful, a message will be shown like this:

Expand All @@ -147,6 +150,7 @@ If compliation is successful, a message will be shown like this:
> ./build_abacus_intel.sh
> or you can modify the builder scripts to suit your needs.
```

You can run *build_abacus_gnu.sh* or *build_abacus_intel.sh* to build ABACUS
by gnu-toolchain or intel-toolchain respectively, the builder scripts will
automatically locate the environment and compile ABACUS.
Expand All @@ -160,17 +164,28 @@ If users want to use toolchain but lack of some system library
dependencies, *install_requirements.sh* scripts will help.

If users want to re-install all the package, just do:

```shell
> rm -rf install
```

or you can also do it in a more completely way:

```shell
> rm -rf install build/*/* build/OpenBLAS*/ build/setup_*
```

## Common Problem and Solution
## Common Problems and Solutions

### LibRI and LibComm for EXX

- GCC toolchain with OpenMPI cannot compile LibComm v0.1.1 due to the different MPI variable type from MPICH and IntelMPI, see discussion here [#5033](https://github.com/deepmodeling/abacus-develop/issues/5033), you can switch to GCC-MPICH or Intel toolchain
- It is recommended to use Intel toolchain if one wants to include EXX feature in ABACUS, which can have much better performance and can use more than 16 threads in OpenMP parallelization to accelerate the EXX process.

### GPU version of ABACUS
add following options in build*.sh:

For GPU version of ABACUS (do not GPU version installer of ELPA, which is still doing work), add following options in build*.sh:

```shell
cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
-DCMAKE_CXX_COMPILER=icpx \
Expand All @@ -180,46 +195,59 @@ cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
-DCMAKE_CUDA_COMPILER=${path to cuda toolkit}/bin/nvcc \
......
```
Notice: You CANNOT use `icpx` compiler for GPU version of ABACUS for now

### shell problem
Notice: You CANNOT use `icpx` compiler for GPU version of ABACUS for now, see discussion here [#2906](https://github.com/deepmodeling/abacus-develop/issues/2906) and [#4976](https://github.com/deepmodeling/abacus-develop/issues/4976)


### Shell problem

If you encounter problem like:

```shell
/bin/bash^M: bad interpreter: No such file or directory
```

or `permission denied` problem, you can simply run:

```shell
./pre_set.sh
```

And also, you can fix `permission denied` problem via `chmod +x`
if *pre_set.sh* have no execution permission;
if the *pre_set.sh* also have `/bin/bash^M` problem, you can run:
```

```shell
> dos2unix pre_set.sh
```

to fix it

### libtorch and deepks problem
### Libtorch and DeePKS problem

If deepks feature have problem, you can manually change libtorch version
from 2.0.1 to 1.12.0 in `toolchain/scripts/stage4/install_libtorch.sh`.
from 2.1.2 to 2.0.1 or 1.12.0 in `toolchain/scripts/stage4/install_libtorch.sh`.

Also, you can install ABACUS without deepks by removing all the deepks and related options.

NOTICE: if you want deepks feature, your intel-mkl environment should be accessible in building process. you can check it in `build_abacus_gnu.sh`

### deepmd feature problem
### DeePMD feature problem

When you encounter problem like `GLIBCXX_3.4.29 not found`, it is sure that your `gcc` version is lower than the requirement of `libdeepmd`.

After my test, you need `gcc`>11.3.1 to enable deepmd feature in ABACUS.

### ELPA problem via Intel-oneAPI toolchain in AMD server

The default compiler for Intel-oneAPI is `icpx` and `icx`, which will cause problem when compling ELPA in AMD server. (Which is a problem and needed to have more check-out)

The best way is to change `icpx` to `icpc`, `icx` to `icc`. user can manually change it in toolchain*.sh via `--with-intel-classic=yes`

Notice: `icc` and `icpc` from Intel Classic Compiler of Intel-oneAPI is not supported for 2024.0 and newer version.

Notice: `icc` and `icpc` from Intel Classic Compiler of Intel-oneAPI is not supported for 2024.0 and newer version. And Intel-OneAPI 2023.2.0 can be found in website. See discussion here [#4976](https://github.com/deepmodeling/abacus-develop/issues/4976)

### Intel-oneAPI problem

Sometimes Intel-oneAPI have problem to link `mpirun`,
which will always show in 2023.2.0 version of MPI in Intel-oneAPI.
Try `source /path/to/setvars.sh` or install another version of IntelMPI may help.
Expand All @@ -229,7 +257,6 @@ And will not occur in Intel-MPI before 2021.10.0 (Intel-oneAPI before 2023.2.0)

More problem and possible solution can be accessed via [#2928](https://github.com/deepmodeling/abacus-develop/issues/2928)


## Advanced Installation Usage

1. Users can move toolchain directory to anywhere you like,
Expand All @@ -243,4 +270,5 @@ of each packages, which may let the installation more fiexible.


## More

More infomation can be read from `Details.md`.
11 changes: 10 additions & 1 deletion toolchain/build_abacus_gnu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
-DENABLE_LIBXC=ON \
-DUSE_OPENMP=ON \
-DUSE_ELPA=ON \
-DENABLE_RAPIDJSON=ON \
-DENABLE_RAPIDJSON=OFF \
# -DENABLE_DEEPKS=1 \
# -DTorch_DIR=$LIBTORCH \
# -Dlibnpy_INCLUDE_DIR=$LIBNPY \
Expand All @@ -74,3 +74,12 @@ cat << EOF > "${TOOL}/abacus_env.sh"
source $INSTALL_DIR/setup
export PATH="${PREFIX}/bin":\${PATH}
EOF

# generate information
cat << EOF
========================== usage =========================
Done!
To use the installed ABACUS version
You need to source $(pwd)/abacus_env.sh first !
"""
EOF
11 changes: 10 additions & 1 deletion toolchain/build_abacus_intel-mpich.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
-DENABLE_LIBXC=ON \
-DUSE_OPENMP=ON \
-DUSE_ELPA=ON \
-DENABLE_RAPIDJSON=ON \
-DENABLE_RAPIDJSON=OFF \
# -DENABLE_DEEPKS=1 \
# -DTorch_DIR=$LIBTORCH \
# -Dlibnpy_INCLUDE_DIR=$LIBNPY \
Expand All @@ -64,4 +64,13 @@ cat << EOF > "${TOOL}/abacus_env.sh"
#!/bin/bash
source $INSTALL_DIR/setup
export PATH="${PREFIX}/bin":\${PATH}
EOF

# generate information
cat << EOF
========================== usage =========================
Done!
To use the installed ABACUS version
You need to source $(pwd)/abacus_env.sh first !
"""
EOF
11 changes: 10 additions & 1 deletion toolchain/build_abacus_intel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
-DENABLE_LIBXC=ON \
-DUSE_OPENMP=ON \
-DUSE_ELPA=ON \
-DENABLE_RAPIDJSON=ON \
-DENABLE_RAPIDJSON=OFF \
# -DENABLE_DEEPKS=1 \
# -DTorch_DIR=$LIBTORCH \
# -Dlibnpy_INCLUDE_DIR=$LIBNPY \
Expand All @@ -66,3 +66,12 @@ cat << EOF > "${TOOL}/abacus_env.sh"
source $INSTALL_DIR/setup
export PATH="${PREFIX}/bin":\${PATH}
EOF

# generate information
cat << EOF
========================== usage =========================
Done!
To use the installed ABACUS version
You need to source $(pwd)/abacus_env.sh first !
"""
EOF
12 changes: 8 additions & 4 deletions toolchain/scripts/stage1/install_mpich.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,17 @@
# TODO: Review and if possible fix shellcheck errors.
# shellcheck disable=all

# Last Update in 2023-0918
# Last Update in 2024-0912

[ "${BASH_SOURCE[0]}" ] && SCRIPT_NAME="${BASH_SOURCE[0]}" || SCRIPT_NAME=$0
SCRIPT_DIR="$(cd "$(dirname "$SCRIPT_NAME")/.." && pwd -P)"

# mpich_ver="4.0.3"
# mpich_sha256="17406ea90a6ed4ecd5be39c9ddcbfac9343e6ab4f77ac4e8c5ebe4a3e3b6c501"
mpich_ver="4.1.2"
mpich_sha256="3492e98adab62b597ef0d292fb2459b6123bc80070a8aa0a30be6962075a12f0"
# mpich_ver="4.1.2"
# mpich_sha256="3492e98adab62b597ef0d292fb2459b6123bc80070a8aa0a30be6962075a12f0"
mpich_ver="4.2.2"
mpich_sha256="883f5bb3aeabf627cb8492ca02a03b191d09836bbe0f599d8508351179781d41"
mpich_pkg="mpich-${mpich_ver}.tar.gz"

source "${SCRIPT_DIR}"/common_vars.sh
Expand All @@ -35,13 +37,15 @@ case "${with_mpich}" in
pkg_install_dir="${INSTALLDIR}/mpich-${mpich_ver}"
#pkg_install_dir="${HOME}/apps/mpich/${mpich_ver}-intel"
install_lock_file="$pkg_install_dir/install_successful"
url="https://www.mpich.org/static/downloads/${mpich_ver}/${mpich_pkg}"
if verify_checksums "${install_lock_file}"; then
echo "mpich-${mpich_ver} is already installed, skipping it."
else
if [ -f ${mpich_pkg} ]; then
echo "${mpich_pkg} is found"
else
download_pkg_from_ABACUS_org "${mpich_sha256}" "${mpich_pkg}"
#download_pkg_from_ABACUS_org "${mpich_sha256}" "${mpich_pkg}"
download_pkg_from_url "${mpich_sha256}" "${mpich_pkg}" "${url}"
fi
echo "Installing from scratch into ${pkg_install_dir} for MPICH device ${MPICH_DEVICE}"
[ -d mpich-${mpich_ver} ] && rm -rf mpich-${mpich_ver}
Expand Down
13 changes: 9 additions & 4 deletions toolchain/scripts/stage1/install_openmpi.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@
# TODO: Review and if possible fix shellcheck errors.
# shellcheck disable=all

# Last Update in 2024-0811
# Last Update in 2024-0912

[ "${BASH_SOURCE[0]}" ] && SCRIPT_NAME="${BASH_SOURCE[0]}" || SCRIPT_NAME=$0
SCRIPT_DIR="$(cd "$(dirname "$SCRIPT_NAME")/.." && pwd -P)"

openmpi_ver="5.0.3"
openmpi_sha256="990582f206b3ab32e938aa31bbf07c639368e4405dca196fabe7f0f76eeda90b"
openmpi_ver="5.0.5"
openmpi_sha256="6588d57c0a4bd299a24103f4e196051b29e8b55fbda49e11d5b3d32030a32776"
# openmpi_ver="4.1.6"
# openmpi_sha256="f740994485516deb63b5311af122c265179f5328a0d857a567b85db00b11e415"
openmpi_pkg="openmpi-${openmpi_ver}.tar.bz2"

source "${SCRIPT_DIR}"/common_vars.sh
Expand All @@ -33,13 +35,14 @@ case "${with_openmpi}" in
pkg_install_dir="${INSTALLDIR}/openmpi-${openmpi_ver}"
#pkg_install_dir="${HOME}/apps/openmpi/${openmpi_ver}-gcc8"
install_lock_file="$pkg_install_dir/install_successful"
url="https://download.open-mpi.org/release/open-mpi/v${openmpi_ver:0:3}/${openmpi_pkg}"
if verify_checksums "${install_lock_file}"; then
echo "openmpi-${openmpi_ver} is already installed, skipping it."
else
if [ -f ${openmpi_pkg} ]; then
echo "${openmpi_pkg} is found"
else
download_pkg_from_ABACUS_org "${openmpi_sha256}" "${openmpi_pkg}"
download_pkg_from_url "${openmpi_sha256}" "${openmpi_pkg}" "${url}"
fi
echo "Installing from scratch into ${pkg_install_dir}"
[ -d openmpi-${openmpi_ver} ] && rm -rf openmpi-${openmpi_ver}
Expand All @@ -59,6 +62,8 @@ case "${with_openmpi}" in
fi
fi
# OpenMPI 5.0 only supports PMIx
# PMI support is required for Slurm, but not for other schedulers
# default not use
# if [ $(command -v srun) ]; then
# echo "Slurm installation found. OpenMPI will be configured with --with-pmi."
# EXTRA_CONFIGURE_FLAGS="--with-pmi"
Expand Down
Loading

0 comments on commit c8c0a10

Please sign in to comment.