GPUノード上でのParabricksの使い方
This document pertains to the former NIG Supercomputer (2019) and is retained for reference purposes only.
Please note that it does not reflect the behavior or configuration of the current NIG Supercomputer (2025).
Using Parabricks with Apptainer
The following procedure demonstrates how to run Parabricks v4.0 using an Apptainer image file. (For details on Apptainer itself, please refer to Using Apptainer (Singularity).)
You can use a Parabricks image either prepared by yourself or available under /opt/pkg/nvidia/parabricks
on the NIG Supercomputer.
Logging in to the Slurm GPU Queue Interactive Node and Submitting a Job
A front-end server at022vm02
is available for job submission.
Prerequisite:
- You are already logged in to
gwa.ddbj.nig.ac.jp
.
Log in to the Slurm GPU queue interactive node:
$ ssh at022vm02
Download the sample dataset (assuming you are in /home/$(id -un)/
):
$ wget -O parabricks_sample.tar.gz "https://s3.amazonaws.com/parabricks.sample/parabricks_sample.tar.gz"
Extract the downloaded file and confirm that the parabricks_sample
directory is created:
$ tar -zxf parabricks_sample.tar.gz
$ ls
................ parabricks_sample ................ ................ ................
Create a job script. In this example, save the script as test.sh
with the following content:
- Job script:
test.sh
#!/bin/bash
#
#SBATCH --partition=all # Use "all" partition
#SBATCH --job-name=test
#SBATCH --output=res.txt
#SBATCH --mem=384000 # Memory in MB; reserves all 384 GB of GPU node memory
apptainer exec --nv --bind /home/$(id -un):/input_data /opt/pkg/nvidia/parabricks/clara-parabricks_4.0.0-1.sif \
pbrun fq2bam \
--ref /input_data/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq /input_data/parabricks_sample/Data/sample_1.fq.gz /input_data/parabricks_sample/Data/sample_2.fq.gz \
--out-bam /input_data/parabricks_sample/fq2bam_output.bam
The available partitions (equivalent to queues in traditional AGE) are igt009
, igt016
, and all
:
$ sinfo -l
Mon Mar 13 10:44:04 2023
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE NODELIST
igt009 up infinite 1-infinite no NO all 1 reserved igt009
igt016 up infinite 1-infinite no NO all 1 reserved igt016
all* up infinite 1-infinite no NO all 2 reserved igt[009,016]
Submit the job:
$ sbatch test.sh
Verify that the job has been submitted:
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
56 all test pg-user PD 0:00 1 (ReqNodeNotAvail, May be reserved for other job)
After the job completes, check the output log and result files:
$ cat res.txt
$ ls parabricks_sample/
Data Ref fq2bam_output.bam fq2bam_output.bam.bai fq2bam_output_chrs.txt
res.txt
: Output logfq2bam_output.bam
,fq2bam_output.bam.bai
,fq2bam_output_chrs.txt
: Result files
Using Parabricks with Rootless Docker
This section explains how to run Parabricks using Rootless Docker.
- Log in to each worker (GPU) node and start Rootless Docker.
- Log in to the Slurm GPU queue interactive node and submit the job.
Step 1: Start Rootless Docker on Each GPU Node
Rootless Docker is a version of Docker that can be run by non-root users. This procedure prepares the environment to run the Parabricks container.
Reference: https://docs.nvidia.com/clara/parabricks/4.0.0/GettingStarted.html#gettingstarted
Prerequisites:
- You are already logged in to
gwa.ddbj.nig.ac.jp
. - Target GPU nodes:
igt009
andigt016
.
Create the necessary directories and configuration files in your Lustre-based home directory:
$ mkdir -p /home/$(id -un)/.docker/run_igt009
$ mkdir -p /home/$(id -un)/.docker/run_igt016
$ mkdir -p /home/$(id -un)/.config/docker
$ cat <<EOF > /home/$(id -un)/.config/docker/daemon.json
{"data-root":"/data1/rootless-docker-$(id -un)"}
EOF
Log in to each GPU node (igt009
or igt016
) and perform the setup:
$ ssh <GPU node name: igt009 or igt016>
Create the working directory for Rootless Docker on the GPU node's NVMe area (/data1
):
$ mkdir /data1/rootless-docker-$(id -un)
$ chmod 750 /data1/rootless-docker-$(id -un)/
Start Rootless Docker:
$ dockerd-rootless.sh --experimental --storage-driver vfs &
Pull the Parabricks Docker image (version 4.0.0-1 in this example):
$ docker pull nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1
Log out of the GPU node:
$ exit
Step 2: Submit a Job from the Slurm GPU Queue Interactive Node
Use the front-end server at022vm02
for job submission.
Prerequisites:
- You are logged in to
gwa.ddbj.nig.ac.jp
. - Rootless Docker has been started on the target GPU node, as described above.
Log in to the Slurm GPU interactive node:
$ ssh at022vm02
Download the sample dataset (assuming /home/$(id -un)/
):
$ wget -O parabricks_sample.tar.gz "https://s3.amazonaws.com/parabricks.sample/parabricks_sample.tar.gz"
Extract the file and confirm that the directory exists:
$ tar -zxf parabricks_sample.tar.gz
$ ls
................ parabricks_sample ................ ................ ................
Create the job script test.sh
as follows.
Note: source /etc/profile.d/rootless-docker.sh
is required to define environment variables for Rootless Docker.
- Job script:
test.sh
#!/bin/bash
#
#SBATCH --partition=all # Use "all" partition
#SBATCH --job-name=test
#SBATCH --output=res.txt
source /etc/profile.d/rootless-docker.sh
docker run --gpus all --rm --volume /home/$(id -un):/input_data nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 \
pbrun fq2bam \
--ref /input_data/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq /input_data/parabricks_sample/Data/sample_1.fq.gz /input_data/parabricks_sample/Data/sample_2.fq.gz \
--out-bam /input_data/parabricks_sample/fq2bam_output.bam
Check available partitions:
$ sinfo -l
Mon Mar 13 10:44:04 2023
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE NODELIST
igt009 up infinite 1-infinite no NO all 1 reserved igt009
igt016 up infinite 1-infinite no NO all 1 reserved igt016
all* up infinite 1-infinite no NO all 2 reserved igt[009,016]
Submit the job:
$ sbatch test.sh
Check job status:
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
56 all test pg-user PD 0:00 1 (ReqNodeNotAvail, May be reserved for other job)
Once the job completes, check the output and result files:
$ cat res.txt
$ ls parabricks_sample/
Data Ref fq2bam_output.bam fq2bam_output.bam.bai fq2bam_output_chrs.txt
res.txt
: Output logfq2bam_output.bam
,fq2bam_output.bam.bai
,fq2bam_output_chrs.txt
: Result files