Overview of Slurm

Slurm is a type of program known as a job scheduler or resource scheduler, which automatically allocates computing resources (such as CPU cores or memory) to each user in environments utilized by numerous users.

The general analysis section uses Grid Engine.
The personal genome analysis section can use either Grid Engine or Slurm.

Slurm is an open-source job scheduler software, with commercial support provided by one of its developers, SchedMD in the USA. It is software with numerous usage records in large cluster supercomputers both in Japan and abroad, including the USA's LLNL (Lawrence Livermore National Laboratory). It is also available as a job scheduler for HPC on several public clouds.

Reference materials:

Types of Jobs

In the personal genome analysis section's Slurm, the following three types of jobs are mainly used. (Although the Slurm documentation does not explicitly categorize parallel jobs, they are classified separately here for correspondence with the AGE of the genetic research supercomputer's explanation.)

Interactive jobs
- Used when interacting with the supercomputer.
Batch jobs
- Used when running a small number of programs that use only one CPU core.
Parallel jobs
- Used when running a small number of programs that use multiple CPU cores simultaneously.
Array jobs
- Used when sequentially running many batch or parallel jobs.

(For more details on other types of jobs, please refer to the official manual.)

Other Commands

The primary commands used are as follows:

squeue
- Check the current status of jobs.
scancel
- Delete a job.
scontrol
- Change the settings of a job.

For details, please refer to the section on other commands and the official manual.

When a Job Does Not Start

Check the job settings, mainly in the following aspects:
- Ensure the amount of computing resources requested in the job script is correct. Confirm that the description does not exceed the memory amount per node or the physical CPU core count.
- Verify that the executable time is not requesting beyond the partition settings.
Check the congestion status of the supercomputer.
- Refer to the usage status with commands like sinfo. (Refer to How to check the congestion status of the entire cluster)
- Judge the situation based on the job status displayed in the ST field by the squeue command. (Refer to Description of job statuses)
- Check the reason for the waiting state shown in the NODELIST(REASON) field by the squeue command. (Refer to How to check the reason why a job is not executing)

Overview of Slurm

Types of Jobs

Other Commands

When a Job Does Not Start

Recent Maintenance

Recent News

Types of Jobs​

Other Commands​

When a Job Does Not Start​

Recent Maintenance

Recent News

Types of Jobs

Other Commands

When a Job Does Not Start