Skip to content

Commit

Permalink
Merge pull request #436 from carpentries-incubator/cluster.hpc-carpen…
Browse files Browse the repository at this point in the history
…try.org

Generate HPCC Cluster library, fix some random lesson deficiencies
  • Loading branch information
ocaisa authored Sep 21, 2023
2 parents 05f26f8 + fe69a7b commit b369a9a
Show file tree
Hide file tree
Showing 31 changed files with 533 additions and 3 deletions.
6 changes: 5 additions & 1 deletion _episodes/11-connecting.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,9 @@ See the [PuTTY documentation][putty-agent].
### Transfer Your Public Key

{% if site.remote.portal %}
Visit {{ site.remote.portal }} to upload your SSH public key.
Visit [{{ site.remote.portal }}]({{ site.remote.portal }}) to upload your SSH
public key. (Remember, it's the one ending in `.pub`!)

{% else %}
Use the **s**ecure **c**o**p**y tool to send your public key to the cluster.

Expand Down Expand Up @@ -406,6 +408,7 @@ the other files, or files like them: `.bashrc` is a shell configuration file,
which you can edit with your preferences; and `.ssh` is a directory storing SSH
keys and a record of authorized connections.

{% unless site.remote.portal %}
### Install Your SSH Key

> ## There May Be a Better Way
Expand Down Expand Up @@ -449,6 +452,7 @@ password for your SSH key.
{{ site.local.prompt }} ssh {{ site.remote.user }}@{{ site.remote.login }}
```
{: .language-bash}
{% endunless %}

{% include links.md %}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# ---------------------------------------------------------------
# HPC Carpentries in the Cloud: Slurm + Software Stack from EESSI
# ---------------------------------------------------------------
#
# The HPC Carpentry Cluster in the Cloud is provided as a public
# service by volunteers. It is provisioned with Magic Castle
# <https://github.com/ComputeCanada/magic_castle> using the EESSI
# <https://eessi.github.io/docs/> software stack. If you need an
# account, please visit <cluster.hpc-carpentry.org>.
#
# Compute responsibly.
---

snippets: "/snippets_library/HPCC_MagicCastle_slurm"

local:
prompt: "[you@laptop:~]$"
bash_shebang: "#!/usr/bin/env bash"

remote:
name: "HPC Carpentry's Cloud Cluster"
login: "cluster.hpc-carpentry.org"
portal: "https://mokey.cluster.hpc-carpentry.org"
host: "login1"
node: "smnode1"
location: "cluster.hpc-carpentry.org"
homedir: "/home"
user: "yourUsername"
module_python3: "Python"
prompt: "[yourUsername@login1 ~]$"
bash_shebang: "#!/bin/bash"

sched:
name: "Slurm"
submit:
name: "sbatch"
options: ""
queue:
debug: "smnode"
testing: "cpubase_bycore_b1"
status: "squeue"
flag:
user: "-u yourUsername"
interactive: ""
histdetail: "-l -j"
name: "-J"
time: "-t"
queue: "-p"
del: "scancel"
interactive: "srun"
info: "sinfo"
comment: "#SBATCH"
hist: "sacct -u yourUsername"

episode_order:
- 10-hpc-intro
- 11-connecting
- 12-cluster
- 13-scheduler
- 14-modules
- 15-transferring-files
- 16-parallel
- 17-resources
- 18-responsibility
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
```
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
cpubase_bycore_b1* up infinite 4 idle node[1-2],smnode[1-2]
node up infinite 2 idle node[1-2]
smnode up infinite 2 idle smnode[1-2]
```
{: .output}
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
> ## Explore a Worker Node
>
> Finally, let's look at the resources available on the worker nodes where your
> jobs will actually run. Try running this command to see the name, CPUs and
> memory available on the worker nodes:
>
> ```
> {{ site.remote.prompt }} sinfo -o "%n %c %m" | column -t
> ```
> {: .language-bash}
{: .challenge}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
```
~~~ /cvmfs/pilot.eessi-hpc.org/2020.12/software/x86_64/amd/zen2/modules/all ~~~
Bazel/3.6.0-GCCcore-x.y.z NSS/3.51-GCCcore-x.y.z
Bison/3.5.3-GCCcore-x.y.z Ninja/1.10.0-GCCcore-x.y.z
Boost/1.72.0-gompi-2020a OSU-Micro-Benchmarks/5.6.3-gompi-2020a
CGAL/4.14.3-gompi-2020a-Python-3.x.y OpenBLAS/0.3.9-GCC-x.y.z
CMake/3.16.4-GCCcore-x.y.z OpenFOAM/v2006-foss-2020a
[removed most of the output here for clarity]
Where:
L: Module is loaded
Aliases: Aliases exist: foo/1.2.3 (1.2) means that "module load foo/1.2"
will load foo/1.2.3
D: Default Module
Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching
any of the "keys".
```
{: .output}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
If the `python3` command was unavailable, we would see output like

```
/usr/bin/which: no python3 in (/cvmfs/pilot.eessi-hpc.org/2020.12/compat/linux/x86_64/usr/bin:/opt/software/slurm/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/home/{{site.remote.user}}/.local/bin:/home/{{site.remote.user}}/bin)
```
{: .output}

Note that this wall of text is really a list, with values separated
by the `:` character. The output is telling us that the `which` command
searched the following directories for `python3`, without success:

```
/cvmfs/pilot.eessi-hpc.org/2020.12/compat/linux/x86_64/usr/bin
/opt/software/slurm/bin
/usr/local/bin
/usr/bin
/usr/local/sbin
/usr/sbin
/opt/puppetlabs/bin
/home/{{site.remote.user}}/.local/bin
/home/{{site.remote.user}}/bin
```
{: .output}

However, in our case we do have an existing `python3` available so we see

```
/cvmfs/pilot.eessi-hpc.org/2020.12/compat/linux/x86_64/usr/bin/python3
```
{: .output}

We need a different Python than the system provided one though, so let us load
a module to access it.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
```
{{ site.remote.prompt }} module load {{ site.remote.module_python3 }}
{{ site.remote.prompt }} which python3
```
{: .language-bash}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
```
/cvmfs/pilot.eessi-hpc.org/2020.12/software/x86_64/amd/zen2/software/Python/3.x.y-GCCcore-x.y.z/bin/python3
```
{: .output}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
```
{{ site.remote.prompt }} ls /cvmfs/pilot.eessi-hpc.org/2020.12/software/x86_64/amd/zen2/software/Python/3.x.y-GCCcore-x.y.z/bin
```
{: .language-bash}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
```
2to3 nosetests-3.8 python rst2s5.py
2to3-3.8 pasteurize python3 rst2xetex.py
chardetect pbr python3.8 rst2xml.py
cygdb pip python3.8-config rstpep2html.py
cython pip3 python3-config runxlrd.py
cythonize pip3.8 rst2html4.py sphinx-apidoc
easy_install pybabel rst2html5.py sphinx-autogen
easy_install-3.8 __pycache__ rst2html.py sphinx-build
futurize pydoc3 rst2latex.py sphinx-quickstart
idle3 pydoc3.8 rst2man.py tabulate
idle3.8 pygmentize rst2odt_prepstyles.py virtualenv
netaddr pytest rst2odt.py wheel
nosetests py.test rst2pseudoxml.py
```
{: .output}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
```
/cvmfs/pilot.eessi-hpc.org/2020.12/software/x86_64/amd/zen2/software/Python/3.x.y-GCCcore-x.y.z/bin:/cvmfs/pilot.eessi-hpc.org/2020.12/software/x86_64/amd/zen2/software/SQLite/3.31.1-GCCcore-x.y.z/bin:/cvmfs/pilot.eessi-hpc.org/2020.12/software/x86_64/amd/zen2/software/Tcl/8.6.10-GCCcore-x.y.z/bin:/cvmfs/pilot.eessi-hpc.org/2020.12/software/x86_64/amd/zen2/software/GCCcore/x.y.z/bin:/cvmfs/pilot.eessi-hpc.org/2020.12/compat/linux/x86_64/usr/bin:/opt/software/slurm/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/home/user01/.local/bin:/home/user01/bin
```
{: .output}
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
To demonstrate, let's use `module list`. `module list` shows all loaded
software modules.

```
{{ site.remote.prompt }} module list
```
{: .language-bash}

```
Currently Loaded Modules:
1) GCCcore/x.y.z 4) GMP/6.2.0-GCCcore-x.y.z
2) Tcl/8.6.10-GCCcore-x.y.z 5) libffi/3.3-GCCcore-x.y.z
3) SQLite/3.31.1-GCCcore-x.y.z 6) Python/3.x.y-GCCcore-x.y.z
```
{: .output}

```
{{ site.remote.prompt }} module load GROMACS
{{ site.remote.prompt }} module list
```
{: .language-bash}

```
Currently Loaded Modules:
1) GCCcore/x.y.z 14) libfabric/1.11.0-GCCcore-x.y.z
2) Tcl/8.6.10-GCCcore-x.y.z 15) PMIx/3.1.5-GCCcore-x.y.z
3) SQLite/3.31.1-GCCcore-x.y.z 16) OpenMPI/4.0.3-GCC-x.y.z
4) GMP/6.2.0-GCCcore-x.y.z 17) OpenBLAS/0.3.9-GCC-x.y.z
5) libffi/3.3-GCCcore-x.y.z 18) gompi/2020a
6) Python/3.x.y-GCCcore-x.y.z 19) FFTW/3.3.8-gompi-2020a
7) GCC/x.y.z 20) ScaLAPACK/2.1.0-gompi-2020a
8) numactl/2.0.13-GCCcore-x.y.z 21) foss/2020a
9) libxml2/2.9.10-GCCcore-x.y.z 22) pybind11/2.4.3-GCCcore-x.y.z-Pytho...
10) libpciaccess/0.16-GCCcore-x.y.z 23) SciPy-bundle/2020.03-foss-2020a-Py...
11) hwloc/2.2.0-GCCcore-x.y.z 24) networkx/2.4-foss-2020a-Python-3.8...
12) libevent/2.1.11-GCCcore-x.y.z 25) GROMACS/2020.1-foss-2020a-Python-3...
13) UCX/1.8.0-GCCcore-x.y.z
```
{: .output}

So in this case, loading the `GROMACS` module (a bioinformatics software
package), also loaded `GMP/6.2.0-GCCcore-x.y.z` and
`SciPy-bundle/2020.03-foss-2020a-Python-3.x.y` as well. Let's try unloading the
`GROMACS` package.

```
{{ site.remote.prompt }} module unload GROMACS
{{ site.remote.prompt }} module list
```
{: .language-bash}

```
Currently Loaded Modules:
1) GCCcore/x.y.z 13) UCX/1.8.0-GCCcore-x.y.z
2) Tcl/8.6.10-GCCcore-x.y.z 14) libfabric/1.11.0-GCCcore-x.y.z
3) SQLite/3.31.1-GCCcore-x.y.z 15) PMIx/3.1.5-GCCcore-x.y.z
4) GMP/6.2.0-GCCcore-x.y.z 16) OpenMPI/4.0.3-GCC-x.y.z
5) libffi/3.3-GCCcore-x.y.z 17) OpenBLAS/0.3.9-GCC-x.y.z
6) Python/3.x.y-GCCcore-x.y.z 18) gompi/2020a
7) GCC/x.y.z 19) FFTW/3.3.8-gompi-2020a
8) numactl/2.0.13-GCCcore-x.y.z 20) ScaLAPACK/2.1.0-gompi-2020a
9) libxml2/2.9.10-GCCcore-x.y.z 21) foss/2020a
10) libpciaccess/0.16-GCCcore-x.y.z 22) pybind11/2.4.3-GCCcore-x.y.z-Pytho...
11) hwloc/2.2.0-GCCcore-x.y.z 23) SciPy-bundle/2020.03-foss-2020a-Py...
12) libevent/2.1.11-GCCcore-x.y.z 24) networkx/2.4-foss-2020a-Python-3.x.y
```
{: .output}

So using `module unload` "un-loads" a module, and depending on how a site is
configured it may also unload all of the dependencies (in our case it does
not). If we wanted to unload everything at once, we could run `module purge`
(unloads everything).

```
{{ site.remote.prompt }} module purge
{{ site.remote.prompt }} module list
```
{: .language-bash}

```
No modules loaded
```
{: .output}

Note that `module purge` is informative. It will also let us know if a default
set of "sticky" packages cannot be unloaded (and how to actually unload these
if we truly so desired).
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<!--- There is a long additional section on some specific features of Lmod in
snippets_library/ComputeCanada_Graham_slurm/modules/wrong-gcc-version.snip,
particular covering the concept of a module hierarchy. While this is not
relevant to the EESSI configuration right now, it may become so in the future
if we change the default module naming scheme to hierarchical. -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
```
{{ site.remote.bash_shebang }}
{{ site.sched.comment }} {{ site.sched.flag.name }} parallel-job
{{ site.sched.comment }} {{ site.sched.flag.queue }} {{ site.sched.queue.testing }}
{{ site.sched.comment }} -N 1
{{ site.sched.comment }} -n 8
# Load the computing environment we need
# (mpi4py and numpy are in SciPy-bundle)
module load {{ site.remote.module_python3 }}
module load SciPy-bundle
# Execute the task
mpiexec amdahl
```
{: .language-bash}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
```
{{ site.remote.bash_shebang }}
{{ site.sched.comment }} {{ site.sched.flag.name }} parallel-job
{{ site.sched.comment }} {{ site.sched.flag.queue }} {{ site.sched.queue.testing }}
{{ site.sched.comment }} -N 1
{{ site.sched.comment }} -n 4
# Load the computing environment we need
# (mpi4py and numpy are in SciPy-bundle)
module load {{ site.remote.module_python3 }}
module load SciPy-bundle
# Execute the task
mpiexec amdahl
```
{: .language-bash}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
```
{{ site.remote.bash_shebang }}
{{ site.sched.comment }} {{ site.sched.flag.name }} solo-job
{{ site.sched.comment }} {{ site.sched.flag.queue }} {{ site.sched.queue.testing }}
{{ site.sched.comment }} -N 1
{{ site.sched.comment }} -n 1
# Load the computing environment we need
module load {{ site.remote.module_python3 }}
# Execute the task
amdahl
```
{: .language-bash}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
```
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
7 file.sh cpubase_b+ def-spons+ 1 COMPLETED 0:0
7.batch batch def-spons+ 1 COMPLETED 0:0
7.extern extern def-spons+ 1 COMPLETED 0:0
8 file.sh cpubase_b+ def-spons+ 1 COMPLETED 0:0
8.batch batch def-spons+ 1 COMPLETED 0:0
8.extern extern def-spons+ 1 COMPLETED 0:0
9 example-j+ cpubase_b+ def-spons+ 1 COMPLETED 0:0
9.batch batch def-spons+ 1 COMPLETED 0:0
9.extern extern def-spons+ 1 COMPLETED 0:0
```
{: .output}
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
```
top - 21:00:19 up 3:07, 1 user, load average: 1.06, 1.05, 0.96
Tasks: 311 total, 1 running, 222 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.2 us, 3.2 sy, 0.0 ni, 89.0 id, 0.0 wa, 0.2 hi, 0.2 si, 0.0 st
KiB Mem : 16303428 total, 8454704 free, 3194668 used, 4654056 buff/cache
KiB Swap: 8220668 total, 8220668 free, 0 used. 11628168 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1693 jeff 20 0 4270580 346944 171372 S 29.8 2.1 9:31.89 gnome-shell
3140 jeff 20 0 3142044 928972 389716 S 27.5 5.7 13:30.29 Web Content
3057 jeff 20 0 3115900 521368 231288 S 18.9 3.2 10:27.71 firefox
6007 jeff 20 0 813992 112336 75592 S 4.3 0.7 0:28.25 tilix
1742 jeff 20 0 975080 164508 130624 S 2.0 1.0 3:29.83 Xwayland
1 root 20 0 230484 11924 7544 S 0.3 0.1 0:06.08 systemd
68 root 20 0 0 0 0 I 0.3 0.0 0:01.25 kworker/4:1
2913 jeff 20 0 965620 47892 37432 S 0.3 0.3 0:11.76 code
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
```
{: .output}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
```
total used free shared buff/cache available
Mem: 3.8G 1.5G 678M 327M 1.6G 1.6G
Swap: 3.9G 170M 3.7G
```
{: .output}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
```
Submitted batch job 7
```
{: .output}
Loading

0 comments on commit b369a9a

Please sign in to comment.