The NIG Supercomputer

Other Commands

This is an old document

This document is a former NIG supercomputer (2019) document and is kept for reference purposes.

Please note that it does not work in the same way on the current NIG supercomputer (2025).

Using GPU

Check CUDA manual.

Checking the job execution status (qstat)

Checking the job submission status

qstate checks whether the submitted job by using qsub was submitted as a job. To check the status of the submitted job, use the qstat command. For example, when the job is submitted, qstate shows as follows.

[username@at027 ~]$ qstat
job-ID     prior   name       user         state submit/start at     queue                          jclass                         slots ja-task-ID 
------------------------------------------------------------------------------------------------------------------------------------------------
     80312 0.50000 QLOGIN     username     r     02/27/2019 17:42:00 login.q@at027                                                     1
     80313 0.25000 jobname    username     r     02/27/2019 17:44:30 epyc.q@at040                                                      1
     80314 0.25000 jobname    username     r     02/27/2019 17:44:35 epyc.q@at040                                                      1
     80315 0.25000 jobname    username     r     02/27/2019 17:44:40 epyc.q@at040                                                      1
					

The meanings of the characters in the "state" colum at this time are as follows.

statemeaning
rrunning
qwqueued, waiting
ttransferring to the execution host
Eerror in the job
dthe job is in the process of being deleted

The "qstat -f" command shows the queue usage status like this output result.

[username@at027 ~]$ qstat -f
queuename                      qtype resv/used/tot. np_load  arch          states
---------------------------------------------------------------------------------
medium.q@m01                   BP    0/0/80         0.00     lx-amd64      
---------------------------------------------------------------------------------
medium.q@m02                   BP    0/0/80         0.00     lx-amd64      
---------------------------------------------------------------------------------
medium.q@m03                   BP    0/0/80         0.00     lx-amd64      
---------------------------------------------------------------------------------
medium.q@m04                   BP    0/0/80         0.00     lx-amd64      
(中略)
---------------------------------------------------------------------------------
epyc.q@at033                   BP    0/0/64         0.00     lx-amd64      
---------------------------------------------------------------------------------
epyc.q@at034                   BP    0/0/64         0.00     lx-amd64      
---------------------------------------------------------------------------------
epyc.q@at035                   BP    0/0/64         0.00     lx-amd64      
(中略)
---------------------------------------------------------------------------------
intel.q@it003                  BP    0/0/32         0.00     lx-amd64      
---------------------------------------------------------------------------------
intel.q@it004                  BP    0/0/32         0.00     lx-amd64      
---------------------------------------------------------------------------------
intel.q@it005                  BP    0/0/32         0.00     lx-amd64      
---------------------------------------------------------------------------------
intel.q@it006                  BP    0/0/32         0.00     lx-amd64      
---------------------------------------------------------------------------------
(以下略)

This makes it possible to determine which node (queue) the job is submitted to.

"qstaus -j jobID" prints information about the job with a list.

[username@at027 ~]$ qstat -j 199666
==============================================================
job_number:                 199666
jclass:                     NONE
submission_time:            02/27/2019 17:42:00.867
owner:                      username
uid:                        9876
group:                      ddbj
gid:                        9876
supplementary group:        ddbj
sge_o_home:                 /home/username
sge_o_log_name:             username
sge_o_path:                 /cm/local/apps/gcc/7.2.0/bin:/home/geadmin/UGER/bin/lx-amd64:/cm/local/apps/environment-modules/4.0.0//bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/4.0.0/bin:/home/username/.local/bin:/home/username/bin
sge_o_shell:                /bin/bash
sge_o_workdir:              /lustre8/home/username
sge_o_host:                 gw1
account:                    sge
stderr_path_list:           NONE:NONE:/dev/null
hard resource_list:         d_rt=259200,mem_req=8G,s_rt=259200,s_stack=10240K,s_vmem=8G
soft resource_list:         epyc=TRUE,gpu=TRUE,intel=TRUE,login=TRUE
mail_list:                  username@gw1
notify:                     FALSE
job_name:                   QLOGIN
stdout_path_list:           NONE:NONE:/dev/null
priority:                   0
jobshare:                   0
restart:                    n
env_list:                   TERM=xterm
department:                 defaultdepartment
binding:                    NONE
mbind:                      NONE
submit_cmd:                 qlogin
category_id:                4
request_dispatch_info:      FALSE
start_time            1:    02/27/2019 17:42:00.884
job_state             1:    r
exec_host_list        1:    at027:1
granted_req.          1:    mem_req=8.000G
usage                 1:    wallclock=01:00:01, cpu=00:00:00, mem=0.00000 GBs, io=0.00000 GB, iow=0.000 s, ioops=0, vmem=N/A, maxvmem=N/A
scheduling info:            -


Checking the status of all cluster queues

To get a grasp on the job submission status of each queue, the queue load status, etc., type and execute "qstat -g c". You can also get the overall status with it.

[username@at027 ~]$ qstat -g c
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
epyc.q                            0.00      1      0   4159   4224      0     64 
gpu.q                             0.00      0      0     64    112      0     48 
intel.q                           0.00      0      0   1472   1472      0      0 
login.q                           0.00      4      0    380    384      0      0 
login_gpu.q                       0.00      0      0     48     48      0      0 
medium.q                          0.00      0      0    800    800      0      0 
short.q                           0.00      0      0    128    224      0     96 

The meanings of the numbers are as follows

You can find more details with man qstat.

Deleting jobs(qdel)

The qdel command is used to delete a job without waiting for the job to finish. Use "qdel jobID". "qdel -u username" enables to delete all jobs that you have submitted.

Parameter DescriptionDescription
-f *job_id(s)* Forces the deletion a job even if the responsible execution host does not respond.
*job_id* -t range Deletes specific tasks of an array job. It is also possible to delete a specific range of array jobs.
-u *user_list*Deletes all job of the specified user.

Changing the job execution condition(qalter)

The qalter can be used to change the amount of memory used by a batch job submitted to a queue. To change the memory requirement of a job from 200GB to 100GB, first check the hard_resource_list of the job with qstat command. Since the change of hard_resource_list by qalter overwrites the entire hard_resource_list, you should also specify and change the hard_resource_list other than the memory requirement and memory.

[username@at027 ~]$ qstat -j 25453855 |grep hard_resource_list
hard_resource_list:         mem_req=100G,s_vmem=100G

[username@at027 ~]$ qalter -l  d_rt=600,mem_req=100G,s_rt=600,s_stack=10240K,s_vmem=100G,short=TRUE 25453855
modified hard resource list of job 25453855
modified environment of job 25453855

[username@at027 ~]$ qstat -j 25453855 |grep hard_resource_list
hard_resource_list:         d_rt=600,mem_req=100G,s_rt=600,s_stack=10240K,s_vmem=100G,short=TRUE

Checking the result of jobs

About the result, if you don't specify any options in the qsub command line, the standard output is output to the file named job name.o job ID and the standard error output is output with the following file name in the home directory.

job name.e jobID

You can use the qreport command to check detailed information such as how much resources the executed job used.

[username@at137 ~]$ qreport -j 110488
==============================================================
owner                okishinya                                                  
jobnumber            110488                                                     
taskid               undefined                                                  
slots                1                                                          
pe_taskid            NONE                                                       
granted_pe           NONE                                                       
exit_status          0                                                          
failed               0                                                          
qname                epyc.q                                                     
hostname             at116                                                      
jobname              TEST_intel                                                 
qsub_time            20190307-22:30:33                                          
start_time           20190307-22:31:31                                          
end_time             20190307-22:31:35                                          
ru_wallclock         4.249                                                      
cpu                  3.0                                                        
mem                  0.0                                                        
maxvmem              0.0G                                                       
r_mem                8.000G                                                     
r_q                  NONE                                                       
r_cpu                NONE

Setup of environment variables required for running Grid Engine

To be able to execute commands such as qsub of Grid Engine, a series of environment variables must be set up. Normally, you do not need to set up these environment variables yourself because they are automatically set up when you login to the gateway node of the general analysis division.

Explicit setup may be required in special cases, such as when you use qsub from the Singularity container. In this case, execute the following command.

As in the User’s Manual, the necessary environment variables are set up by executing the shell script provided by Grid Engine as follows:

export SGE_ROOT=/home/geadmin/UGER
export SGE_CELL=uger
source $SGE_ROOT/$SGE_CELL/common/settings.sh