Skip to content

CPU Job Statistics

Slurm has to be configured to track job accounting data via the cgroup plug-in. This requires the following line in slurm.conf:

JobAcctGatherType=jobacct_gather/cgroup

The above is in addition to the other usual cgroup-related plug-ins/settings:

ProctrackType=proctrack/cgroup
TaskPlugin=affinity,cgroup

Slurm will then create two top-level cgroup directories for each job, one for CPU utilization and one for CPU memory. Within each directory there will be subdirectories: step_extern, step_batch, step_0, step_1, and so on. Within these directories one finds task_0, task_1, and so on. These cgroups are scraped by a cgroup exporter. The table below lists all of the collected fields:

Name Description Type
cgroup_cpu_system_seconds Cumulative CPU system seconds for jobid gauge
cgroup_cpu_total_seconds Cumulative CPU total seconds for jobid gauge
cgroup_cpu_user_seconds Cumulative CPU user seconds for jobid gauge
cgroup_cpus Number of CPUs in the jobid gauge
cgroup_memory_cache_bytes Memory cache used in bytes gauge
cgroup_memory_fail_count Memory fail count gauge
cgroup_memory_rss_bytes Memory RSS used in bytes gauge
cgroup_memory_total_bytes Memory total given to jobid in bytes gauge
cgroup_memory_used_bytes Memory used in bytes gauge
cgroup_memsw_fail_count Swap fail count gauge
cgroup_memsw_total_bytes Swap total given to jobid in bytes gauge
cgroup_memsw_used_bytes Swap used in bytes gauge
cgroup_uid UID number of user running this job gauge

The cgroup exporter used here is based on the exporter by Trey Dock [1] with additional parsing of the jobid, steps, tasks and UID number. This produces an output that resembles (e.g., for system seconds):

cgroup_cpu_system_seconds{jobid="247463", step="batch", task="0"}
160.92

Note that the UID of the owning user is stored as a gauge in cgroup_uid:

cgroup_uid{jobid="247463"}
334987

This is because accounting is job-oriented and having a UID of the user as a label would needlessly increase the cardinality of the data in Prometheus. All other fields are alike with jobid, step and task labels.

The totals for a job have an empty step and task, for example:

cgroup_cpu_user_seconds{jobid="247463", step="", task=""}
202435.71

This is due to the organization of the cgroup hierarchy. Consider the directory:

/sys/fs/cgroup/cpu,cpuacct/slurm/uid_334987

Within this directory, one finds the following subdirectories:

job_247463/cpuacct.usage_user
job_247463/step_extern/cpuacct.usage_user
job_247463/step_extern/task_0/cpuacct.usage_user

This is the data most often retrieved and parsed for overall job efficiency which is why by default the cgroup_exporter does not parse step or task data. To collect all of it, add the --collect.fullslurm option. We run the cgroup_exporter with these options:

/usr/sbin/cgroup_exporter --config.paths /slurm --collect.fullslurm

The --config.paths /slurm has to match the path used by Slurm under the top cgroup directory. This is usually a path that is something like /sys/fs/cgroup/memory/slurm.