Other Commands
How to Use GPUs
In the personal genome analysis section, GPU compute nodes are operated by allocating jobs on a per-node basis, without the environment setup for dividing GPUs within a node. Therefore, when submitting jobs that utilize the GPU, there is no need to explicitly specify GPU usage as an option.
Checking the Status of Job Execution (squeue)
Checking the Status of Job Submission
The squeue
command can be used to check the status of job execution. For details on the options, please refer to the online manual.
Example of execution
xxxxx-pg@at022vm02:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
751 parabrick test.sh xxxxx-pg R 0:02 3 igt[010,015-016]
750 parabrick test.sh xxxxx-pg R 0:05 3 igt[010,015-016]
749 parabrick test.sh xxxxx-pg R 0:09 3 igt[010,015-016]
748 parabrick test.sh xxxxx-pg R 0:13 3 igt[010,015-016]
The default items displayed by squeue are as follows:
Item Name | Description |
---|---|
JOBID | Displays the job ID assigned to the job. |
PARTITION | Displays the name of the partition (queue) into which the job was submitted. |
NAME | Displays the job name (if not specified, the command string is displayed). |
USER | Displays the name of the user who submitted the job. |
ST | Displays the job status. The main job statuses are shown in the table below. |
TIME | Job execution time (format: days-hh:mm:ss) |
NODES | Number of nodes used for job execution |
NODELIST(REASON) | List of hostnames where the job is executed |
Job Status Description (ST field)
Status Character | Description |
---|---|
CA (CANCELLED) | The job was explicitly cancelled by a user or system administrator. |
CD (COMPLETED) | The job has finished executing on all nodes. |
CF (CONFIGURING) | The job is waiting for resources to become usable after being allocated. |
CG (COMPLETING) | The job is in the process of completing. |
F (FAILED) | The job terminated with a non-zero exit code or another failure condition. |
NF (NODE_FAIL) | The job terminated due to a failure of one of the allocated nodes. |
PD (PENDING) | The job is waiting for resource allocation. It is pending. |
PR (PREEMPTED) | The job terminated due to preemption. |
R (RUNNING) | The job is currently running. |
S (SUSPENDED) | The job has resource allocation (execution is suspended). |
TO (TIMEOUT) | The job terminated because it reached its time limit. |
Checking Detailed Information of a Job (scontrol show job)
If you want to check more detailed information about your job, after confirming the job ID with the squeue
command, you can check it with the scontrol show job
. For details on the options, please refer to the online manual.
Example of execution
xxxxx-pg@at022vm02:~/$ scontrol show job 747
JobId=747 JobName=test
UserId=xxxxx-pg(30257) GroupId=xxxxxx-pg(30063) MCS_label=N/A
Priority=10102 Nice=0 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:12 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2024-02-19T20:57:53 EligibleTime=2024-02-19T20:57:53
AccrueTime=2024-02-19T20:57:53
StartTime=2024-02-19T20:57:53 EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-02-19T20:57:53 Scheduler=Main
Partition=parabricks AllocNode:Sid=at022vm02:1768472
ReqNode
List=(null) ExcNodeList=(null)
NodeList=igt010
BatchHost=igt010
NumNodes=1 NumCPUs=4 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=4,mem=375G,node=1,billing=4
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=4 MinMemoryNode=375G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=./test.sh
WorkDir=/lustre8/home/xxxxx-pg/parabricks
StdErr=/lustre8/home/xxxxx-pg/parabricks/res.txt
StdIn=/dev/null
StdOut=/lustre8/home/xxxxx-pg/parabricks/res.txt
Power=
TresPerNode=gres:gpu:4