Skip to content

Monitoring GPU, RAM and CPU usage for slurm partitions and users

Notifications You must be signed in to change notification settings

Ieremie/slurm-monitoring

Repository files navigation

slurm-monitoring

Monitoring GPU, RAM and CPU usage for slurm partitions and users.

This is an app written in Python using flask. It gathers information using the standard slurm functions (squeue, scontrol etc.) The implementation assumes some fixed parameters such as maximum resource available to reduce the number of requests sent to the slurm server.

You can quickly adapt this implementation to your own server. You can add as many partitions as you want and these will be displayed as a 2-column page.

In the case that resources can be locked by a user and not actually being used, this can be monitored too. This is when a job requests resources from a node containing GPUs, but does not actually use them.

alt text alt text

About

Monitoring GPU, RAM and CPU usage for slurm partitions and users

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published