Tutorial 2 - Using the restricted release reference panel
Procedures for using this system
This system executes workflows in the following steps.
- Prepare restricted release data
- Generate a configuration file for the Imputation Workflow
- Execute the Imputation Workflow
1. Prepare restricted release data
We proceed with the tutorial on the assumption that there is already restricted release data in the personal genome analysis division of the NIG supercomputer.
Notes on file names for restricted release data
File names containing +
may cause problems in subsequent processing, so should be changed by _
or similar.
Extract and prepare restricted release data
By following the steps below, extract the file, create the config file and check that the config file has been created correctly.
Extract the file
tar zxvf <the file name of the restricted release data.tar.gz>
Create the config file
cd <Directory in which restricted release data is extracted>
./generate-default-conofig-file.sh $(pwd)
Check that the config file has been created correctly
ls -l default.config.yaml
This file will be entered in the next UI as Reference panel config file
in the UI.
2. Generate a configuration file for the Imputation Workflow
Access the following address via guacamole on the NIG supercomputer.
http://localhost:5000
When you actually access it, you will see the following screen.
Configure the following items.
- Target VCF file
- Reference panel preset config or other
- Output genotype probability
- Number of threads
For the target VCF file, specify the full path of the VCF file (*.vcf.gz file) to be parsed.
Here, the file that you uploaded is used.
The specific full path will be /home/username/restricted release data/datavcf.gz
.
Select the 'Reference panel preset config or'. By default, you can choose for the following four.
- GRCh37.1KGP
- GRCh37.1KGP-EAS
- GRCh38.1KGP
- GRCh38.1KGP-EAS
For more information on each of them, see 🔗Types of Reference Panels available.
If you want to use other than the above as a Reference Panel, select 'other' and specify the one you want to use for the Reference panel config file.
This time, select other and specify Reference panel config file.
Select 'Output genotyhpe probability'. You can select the following two types. By default, false is selected.
- false
- true
For 'Number of threads', specify the number of threads for the job when running the workflow.
By default, 16 is specified.
After specifying the parameters, press the Set up job button. The generated parameters are displayed at the bottom of the screen. Use this in sapporo-web.
3. Execute the Imputation Workflow
Via guacamole, access the following address.
http://localhost:1121
When accessed, the following screen is displayed.
Select 'Sapporo Service on localhost', which is available by default.
When clicked, you can see the following screen.
Scroll down a little to use the backend workflows and select 'beagle' from the Workflows item and click it.
Select cwltool 3.1
from the Workflow Engine item of Compose Run.
In Workflow Parameters, enter the parameters generated by imputationserver-web-uio.
In this case, delete the {}
written from the beginning and enter the generated parameters.
Press the Execute button at the bottom to run the workflow. The status of the job will be Running.
If the workflow is started successfully, the workflow will be run by cwltool.
If successfully completed, COMPLETE
.
You can get the result file from your browser. Click on Outputs in the Run log to list the result files.
When you click on the file you want to download, a dialogue appears. By default, the file is downloaded under ~/downloads
.
Get results
After running the Imputation Workflow, you can get the follows from your web browser.
You can copy the following commands to your computer.
Open a terminal.
When executed, the file will be downloaded to the directory where you are currently executing the command.
scp ([your account name])@gwa.ddbj.nig.ac.jp:~/download/([filename you want to download]) .
(your account name)
is the account you use to login to the Personal Genome Analysis environment- For
(file name you want to download)
, specify the name of the file you want to download.
You can also download the file directly from the results directory of sapporo-service.
Search Run ID
.
The Run ID
is displayed on the right of Run ID
.
You can copy the Run ID
(runid) by clicking on the icon on the right.
All files are in first two characters /runid
/outputs/ of the installed directory /sapporo-service/run/runid
.
If runid
is 1b19d002-8d4c-4f52-973c-66a165cd135f
, the first two characters are 1b
.
When you copy with the scp command, enter the following.
A directory called outputs
will be created in your computer, and the analysis results will be copied from the Personal Genome Analysis division to your computer.
scp -i [your private key file] -r ([your account name])@gwa.ddbj.nig.ac.jp:~/sapporo-install/sapporo-service/run/1b/1b19d002-8d4c-4f52-973c-66a165cd135f/ outputs outputs