Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Farm documentation #3

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/farm/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@


Farm is a Linux-based supercomputing cluster for the College of Agricultural and Environmental Sciences at UC Davis. Designed for both research and teaching, it is a significant campus resource primarily for CPU and RAM-based computing, with a wide selection of centrally-managed software available for research in genetics, proteomics, and related bioinformatics pipelines, weather and environmental modeling, fluid and particle simulations, geographic information system (GIS) software, and more.
18 changes: 18 additions & 0 deletions docs/farm/scheduling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@

Slurm is the job scheduler used for batch queue management in the clusters.
Slurm is an open-source resource manager (batch queue) and job scheduler designed for Linux clusters of all sizes.
The general idea with a batch queue is that you don't have to babysit your jobs. You submit it, and it'll run until it dies, or there is a problem. You can configure it to notify you via email when that happens. This allows very efficient use of the cluster. You can still babysit/debug your jobs if you wish using an interactive session (ie qlogin).

Our main concern is that all jobs go through the batch queuing system. Do not bypass the batch queue. We don't lock anything down but that doesn't mean we can't or won't. If you need to retrieve files from a compute node feel free to ssh directly to it and get them, but don't impact other jobs that have gone through the queue.

Here are some useful Slurm commands with their purpose:
sinfo reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options.
smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.
sbatch is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.
squeue reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.
srun is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.).
A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.
smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.
scancel is used to stop a job early. Example, when you queue the wrong script or you know it's going to fail because you forgot something.
See more in "Monitoring Jobs" in the Slurm Example Scripts article in the Help Documents.
More in depth information at http://slurm.schedmd.com/documentation.html
14 changes: 14 additions & 0 deletions docs/farm/software/modules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@


Most packages that are installed are available as environment modules. You can find out about an installed software/module using the following command:-

module avail -l | grep -i <module-name>
This command will list all the installed software and modules in your cluster.

Use:
module load <module/version> to load a module
module unload <module/version> when done.

Generally, use as few modules as possible at a time–once you're done using a particular piece of software, unload the module before you load another one, to avoid incompatibilities.

If you cannot find a piece of software on the cluster, you can request an installation for cluster-wide use. Contact the [email protected] with the name of the cluster, your username, the name of the software, and a link to the software's website, documentation, or installation directions, if applicable.
14 changes: 14 additions & 0 deletions docs/farm/storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@


All researchers in CA&ES are entitled to free access to

8 nodes with 24 CPUs and 64GB RAM each (up to a maximum of 192 CPUs and 512GB RAM) in Farm II’s low, medium, and high-priority batch queues,
3 nodes with 128 CPUs and 265GB RAM each in Farm III's low2, med2, and high2-priority batch queues.
The bml (bigmem, low priority/requeue) partition, which has 24 nodes with a combined 60 TB of RAM.
In addition to this, each new user is allocated a 20GB home directory.

If you want to use the CA&ES free tier, select “CA&ES free tier" from the list of sponsors here.

Additional usage and access may be purchased by contributing to Farm III by through the node and/or storage rates or by purchasing equipment and contributing through the rack fee rate.

Contributors always receive priority access to the resources that they have purchased within one minute with the “one-minute guarantee.” Users can also request additional unused resources on a “fair share” basis in the medium or low partitions.
Loading