Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 902 Bytes

README.md

File metadata and controls

12 lines (8 loc) · 902 Bytes

slurm-monitoring

Monitoring GPU, RAM and CPU usage for slurm partitions and users.

This is an app written in Python using flask. It gathers information using the standard slurm functions (squeue, scontrol etc.) The implementation assumes some fixed parameters such as maximum resource available to reduce the number of requests sent to the slurm server.

You can quickly adapt this implementation to your own server. You can add as many partitions as you want and these will be displayed as a 2-column page.

In the case that resources can be locked by a user and not actually being used, this can be monitored too. This is when a job requests resources from a node containing GPUs, but does not actually use them.

alt text alt text