Skip to content

Conference call notes 20201223

Kenneth Hoste edited this page Dec 23, 2020 · 5 revisions

(back to Conference calls)

Notes on the 163rd EasyBuild conference call, Wednesday December 23rd 2020 (9:00 UTC - 10:00 CET)

Attendees

Alphabetical list of attendees (13):

  • Sebatian Achilles (Jülich Supercomputing Centre, Germany)
  • Damian Alvarez (Jülich Supercomputing Centre, Germany)
  • Simon Branford (University of Birmingham, UK)
  • Miguel Dias Costa (University of Singapore)
  • Alex Domingo (Vrije Universiteit Brussel, Belgium)
  • Victor Holanda (CSCS, Switzerland)
  • Kenneth Hoste (HPC-UGent, Belgium)
  • Samuel Moors (Vrije Universiteit Brussel, Belgium)
  • Terje Kvernes (University of Oslo, Norway)
  • Mikael Öhman (Chalmers University of Technology, Sweden)
  • Åke Sandgren (Umeå University, Sweden)
  • Jörg Saßmannshausen (NIHR Biomedical Research Centre, UK)
  • Lars Viklund (Umeå University, Sweden)

Agenda

  • update on recent developments
  • support for installing/using a toolchain based on Intel oneAPI
  • compilers and libraries for AMD toolchain (AOCC & co)
  • pros and cons for merging foss and fosscuda toolchains
  • Q&A

Recent developments

  • recent changes
    • framework
      • bug fixes
        • (none)
      • enhancements
        • (none)
    • easyblocks
      • bug fixes
        • (nothing special)
      • enhancements
        • create versioned symlinks (cmake3) for CMake commands (PR #2259)
          • to avoid that PyTorch picks up cmake3 from system...
          • should we provide a function in framework to create symlinks like this
        • unify handling of pylibdirs and don't add duplicated $PYTHONPATH in PythonBundle (PR #2281)
        • add options to run unit tests to TensorFlow EasyBlock (PR #2263)
      • new software
        • (nothing special)
    • easyconfigs
      • bug fixes
        • fix name of source file for GDRCopy v2.1 (PR #11887)
        • add patch to fix miscompilation bug on POWER for GCC 8.x and 9.x (PR #11837)
        • fix compilation of TensorFlow 2.3.1 with CUDA and glibc 2.26 on POWER (PR #11859)
          • this a broader issue (see #11913)
          • also affects CuPy, magma, PyTorch, etc.
          • 2019a toolchains or more recent with CUDA 10.x on RHEL8 (fixed in CUDA 11.0)
          • support for arch-specific patches could come in useful here
        • there's a workaround for the segfault with impi on CentOS 8
      • enhancements
        • (nothing special)
      • new software
      • software updates
      • changes
        • replace easyconfigs for bpp-core/bpp-phyl/bpp-seq v2.4.1 with a single easyconfig for BioPP v2.4.1 (using Bundle easyblock) (PR #11609)
  • to merge/fix/tackle soon
    • framework (v4.4.0 milestone)
      • support additional features in easystack files (see issues #3468, #3512, #3513, #3516)
      • directories that don't contain any library files shouldn't be added to $LD_LIBRARY_PATH (issue #3504)
      • EasyBuild may loop forever when out of disk space (issue #3531)
      • log files leaking into each other when using --robot (issue #3533)
      • avoid duplicate or useless entries in RPATH (issue #3534)
    • easyblocks (v4.4.0 milestone)
      • bug fixes
        • correctly determine path to active binutils in TensorFlow easyblock (PR #2218)
        • fix taking into account --sysroot when installing/using CMake
          • two options:
            • use toolchain file in CMakeMake (PR #2247)
            • patch CMake installation (PR #2248)
        • check scipy test results (PR #2241)
          • some scipy tests are failing, have to tweak accuracy tolerances?
      • enhancements
        • enhance OpenBLAS easyblock to make it aware of optarch (PR #1946)
        • run motorBike tutorial case for recent (community) OpenFOAM versions (PR #2201)
          • needs testing...
        • add support for statically linking Bazel (PR #2272)
        • set $PYTHONNOUSERSITE in PythonBundle.extensions_step (PR #2272)
        • improve Bazel EasyBlock (PR #2285)
        • add support for skipping steps in Extension PythonPackages (PR #2290)
        • set $TF_GPU_COUNT and $TF_TESTS_PER_GPU for TensorFlow tests (PR #2292)
      • new software
    • easyconfigs (v4.4.0 milestone)
      • bug fixes
        • added missing space in configopts in ParaView 5.8.0 easyconfigs using 2020a toolchain (PR #10989)
          • (cfr. discussion last conf call)
      • enhancements
      • new software
        • gobff toolchains (PR #11761)
          • exceptions need to be added to tests
      • software updates

Support for installing/using a toolchain based on Intel oneAPI

  • separate components
  • enhance IntelBase
  • diff between old & new compiler commands?
    • new compilers (icx, icpx, ifx) are also based on Clang
  • diff in linking to MKL?
    • link advisor suggests no

Compilers and libraries for AMD toolchain (AOCC & co)

  • Sebastian's open PR for AOCC (https://github.com/easybuilders/easybuild-easyconfigs/pull/11868)
    • what do we do with the EULA?
      • environment variable to accept EULA?
      • export EASYBUILD_ACCEPT_EULA=AOCC,oneAPI (eb --accept-eula=AOCC)
    • do we need a custom easyblock for AOCC?
      • for example to specify version of AMDlibm? (currently we use latest)
    • Miguel: we also need a toolchain option to control which libm is being linked (-lamdm -lm)
      • with GCC you also need to make code changes when you want to link to libamdm.so
      • framework support for using AOCC compiler
    • AOCC depends on GCC, so should sit on top of GCCcore
      • pre-compiled but it looks in system paths by default
      • which GCC can be controlled via an environment variable
    • Sebastian: article comparing Clang and AOCC showed little performance benefits
  • What about support for ROCM for AMD GPUs?
    • see upcoming talk at FOSDEM HPC devroom
    • this could be quite a bit of work, looks like a complex ecosystem
    • who has time for this?

Pros and cons for merging foss and fosscuda toolchains

  • paint point is OpenMPI+UCX with/without CUDA support
    • OpenMPI with CUDA support can be used on non-GPU systems
    • compatibility of CUDA with recent GCC versions is holding things back a bit
  • benefit would be reducing easyconfigs we need, easier to combine software
  • Alex: weird situation now is having both toolchains like gcccuda + CUDA included as a dep (and mentioned in versionsuffix)
  • for foss/2021a we could start without CUDA support
    • when there's a release of CUDA that's compatible with the GCC in foss/2021a, we could start looking into stuff that requires CUDA
    • that implies being able to swap things like UCX/OpenMPI with a CUDA-capable alternative via dependencies
  • %(cudaver)s template should be set when having CUDA in toolchain, or when including CUDAcore as a dependency
    • currently only set when CUDA is a direct dep
    • Mikael may look into this

Q&A

  • Damian: updating from CentOS 7 to 8
    • haven't seen too many issues
    • installing compatibility libraries for openssl and co helps
    • Simon: ABAQUS & ANSYS don't support RHEL8 yet
    • UiO is already using ANSYS on CentOS 8 for a while
  • Victor: more automation w.r.t. generating easyconfigs for new toolchains
Clone this wiki locally