Site Policy
2017 年 04 月 24 日

Available compilers

The following three types of compilers can be used on the NIG Supercomputer:

Package nameVersionSupported languageLocation of installationDocument URL (outside the site)
Intel Composer XE Linux 12.1.0(XE2011) C,C++,Fortran77,90,95,2003,2008 /opt/intel URL
PGI Accelerator CDK Cluster Development Kit 11.10 C,C++,FORTRAN77,90,95,2003,HPF /opt/pgi URL
Gnu Compiler(OSS) 4.6.2 C,C++,Fortran77,90,95,2003,2008 /usr/local/pkg/gcc URL

How to use the Intel compiler

The 64-bit environment version 12.1.0 20110811 can be used in the user’s default environment. The command names for each language compiler are as follows:

LanguageCompilation command
C icc
C++ icpc
FORTRAN77 ifort
FORTRAN90 ifort

The main options that can be used are as follows. For details, please refer to the documents placed at Developer’s document site.

Compilation optionMeaning
-fast Maximizes the overall speed of the program.
-O1 Optimizes with consideration of the size. Optimization with tendency to increase the object size is omitted.
-O2 Executes optimization. (Default setting) Executes many optimization processes to improve vectorization and execution speed.
-O3 Executes strong loop optimization and memory access optimization such as scalar replacement, loop unrolling, code repetition to remove branching, loop blocking that uses cache efficiently, and data prefetching function in addition to -O2 optimization.
-openmp Generates multi-threaded code according to instructions if there are OpenMP instruction lines. It may be necessary to increase the stack size.
-xtarget Generates special code for Intel processors supporting the command set specified by target. The execution file cannot be executed by processors that are not manufactured by Intel or Intel processors that support lower command sets. The target values are listed below (in order of higher command set level): AVX, SSE4.2, SSE4.1, SSSE3, SSE3, SSE2
-xHost Generates code that utilizes the highest command set available on the host processor to compile it.
-ipo Executes optimization between in-line expansion and other procedures on multiple source files. For the n argument in option, the maximum number of object files to be generated during compilation is specified. The default n value is zero. The compilation time and code size may increase dramatically depending on the conditions.

Furthermore, the recommended option for programs for which debugging has already been completed regarding the Intel compiler and that have been confirmed to deliver correct results is -fast. -fast is expanded for the following optimization options:

  -ipo -O3 -no-prec-div -static -xHost

It is possible that compilation fails, that the results vary, or that the program terminates abnormally depending on the type of optimization. Please compile without optimization first and check the operation and results to increase the optimization levels in steps.

What also needs to be noted is that the Fat and Medium compute nodes and the Thin compute node vary in the corresponding expansion command level (see Hardware configuration). While the above -xHost option executes optimization by automatically judging the corresponding expansion command level, the codes that are optimized and compiled on the Thin node may not operate on the Fat/Medium compute node with the following message output:

Fatal Error: This program was not built to run in your system.
Please verify that both the operating system and the processor support Intel(R) AVX.

Therefore, please compile by explicitly specifying the expansion command set as -xsse4.2 without specifying the -xHost option if you wish to execute optimization compilation for Fat/Medium compute nodes on the login node (Thin compute node), although it may be a bit bothersome.

How to use the PGI compiler

The 64-bit environment version 11.10 can be used. The compiler commands for each language are as follows:

LanguageCompilation command
C pgcc
C++ pgcpp
FORTRAN77 pgf77
FORTRAN90,95,2003 pgf90,pgf95,pgfortran
GPGPU-supporting compiler pgfortran,pgcc

GPGPU-supporting compiler pgfortran,pgcc For basic PGI compiler options, please refer to the following page at the distributor’s site:
Access here for the list of options for the PGI compiler.

Manuals in pdf and html formats are provided under the following directories in the system: Please refer to them as follows (X Window environment is required):

  evince /opt/pgi/linux86-64/current/doc/pgi11ug.pdf
  firefox /opt/pgi/linux86-64/current/doc/index.htm

How to use GCC

The compiler command names for each language under the GNU compiler environment are as follows:

LanguageCompilation command
C gcc
C++ g++
FORTRAN77 g77
FORTRAN90 gfortran

Programming tools and scientific computing library

The NIG Cluster can use the following parallel programming environments:

Package nameBundle productLocation of installationDeveloper’s document site (URL)
Intel Cluster Studio Intel MKL,Intel IPP,Intel TBB,Intel MPI,Intel Trace Analyzer/Collector /opt/intel URL
PGI Accelerator CDK Cluster Development Kit PGDBG,PGPROF,ACML /opt/pgi URL

An example of the command line instruction used when making a dynamic link to the parallel version of MKL is shown below. In the default environment of this system, the path to the library and included file is provided.

  ifort -lmkl_intel -lmkl_intel_thread -lmkl_core -liomp5 -lpthread

For details about each tool, please see the Distributor’s document site.

Furthermore, for GNU environments, Gnu Scientific Library (GSL) is installed as an open source software program (at /usr/local/pkg/XXX).

MPI environment

For the MPI library, the combination of OpenMPI + Intel Compiler (64-bit) and the combination of OpenMPI + gcc (64-bit) are currently available, with the combination of OpenMPI+Intel Compiler set as the default for the user environment. The OpenMPI version currently available in the system as of April 2012 is 1.4.4. Please use the following commands for compilation of MPI programs under this environment:

LanguageCompilation commandPath
C mpicc /usr/local/pkg/openmpi/current_icc/bin
C++ mpic++ /usr/local/pkg/openmpi/current_icc/bin
FORTRAN77 mpif77 /usr/local/pkg/openmpi/current_icc/bin
FORTRAN90 mpif90 /usr/local/pkg/openmpi/current_icc/bin

To compile an MPI program using gcc during open-source compilation, please use the following compilation commands:

LanguageCompilation commandPath
C mpicc /usr/local/pkg/openmpi/current_gcc/bin
C++ mpic++ /usr/local/pkg/openmpi/current_gcc/bin
FORTRAN77 mpif77 /usr/local/pkg/openmpi/current_gcc/bin
FORTRAN90 mpif90 /usr/local/pkg/openmpi/current_gcc/bin

The Intel compiler optimization options are also available at startup for these commands. In this case, the MPI library will fail to link and an error is output if the optimization option “-fast” is specified for linkage to prepare the final object file, since the current OpenMPI library is prepared only as a shared library.

  ld: cannot find -lmpi_cxx

This is because the -fast option is expanded to the following options as the optimization options and the “static” option to specify static linkage is included in the option:

-ipo -O3 -no-prec-div -static -xHost

To prevent this error, specify the options as follows by excluding -static from the options expanded from -fast at the stage of linkage to link to the MPI library and prepare the execution object instead of specifying -fast:

-ipo -O3 -no-prec-div -xHost

Profiler environment

PGPROF included in PGI CDK is available as a profiler. While it is expected that the profiler will be executed on a login node, the precautions described for Java environment become relevant as pgprof is a Java application. Please set up the environment with reference to them before starting up pgprof. Please also refer to this URL for details about the PGPROF functions. In addition, refer to How to use the X Client software on login nodes if you wish to use a GUI environment (X-Window System) with PGPROF.

GPGPU programming environment

GPGPU can be used in parts of the Thin compute nodes (month_gpu.q, parts of debug.q and parts of login.q). The available GPGPU is Tesla M2090, and one unit of it is installed in one X unit of GPU-mounted node. On nodes where GPU can be utilized, the CUDA driver and the CUDA tool kit are available.

ComponentVersion
CUDA driver 4020(4.2)
CUDA tool kit 4.1, V0.2.1221

To develop GPGPU programs, a node equipped with GPGPU is assigned for parts of the login node. Please specify the “-l gpu” option and execute qlogin at the login node.

[username]@gw ~]$ qlogin -l gpu
Your job 7345711 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 7345711 has been successfully scheduled.
Establishing /home/geadmin/UGER/utilbin/lx-amd64/qlogin_wrapper session to host t140i ...
Warning: Permanently added '[t140i]:58350,[172.19.0.140]:58350' (RSA) to the list of known hosts.
username@t140i's password:
Last login: Mon Jun 25 11:32:46 2012 from t351i

On completing login, use the pgaccelinfo command and check the conditions of each environment.

[username@t140 ~]$pgaccelinfo
CUDA Driver Version:           4020(4.2)
NVRM version: NVIDIA UNIX x86_64 Kernel Module  295.36  Sun Apr  1 21:30:56 PDT 2012
Device Number:                 0
Device Name:                   Tesla M2090
Device Revision Number:        2.0
Global Memory Size:            5636554752
Number of Multiprocessors:     16
Number of Cores:               512
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           32768
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       65535 x 65535 x 65535
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    1301 MHz
Initialization time:           33353 microseconds
Current free memory:           5556396032
Upload time (4MB):             1488 microseconds ( 756 ms pinned)
Download time:                 22393 microseconds ( 672 ms pinned)
Upload bandwidth:              2818 MB/sec (5548 MB/sec pinned)
Download bandwidth:             187 MB/sec (6241 MB/sec pinned)

Further, check whether nvcc can be used, and what version it is.

[username@t140 ~]$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Thu_Jan_12_14:41:45_PST_2012
Cuda compilation tools, release 4.1, V0.2.1221

For example, compilation using nvcc is executed as follows:

  [username@t140 ~]$nvcc -o sample1 sample1.cu

When the execution module has been prepared, conduct a test operation (do not go beyond the test operation on the login node in this case. Conduct full-scale computation on the compute node).

  [username@t140 ~]$ sample1

If it operates, prepare the job script and enter it in the month_gpu queue. If the job script name is set provisionally as sample.sh, it is expressed as follows:

  [username@t140 ~]$qsub -l month -l gpu sample.sh
  

For details, please refer to the CUDA manual, the manual for the PGI compiler, and so forth.

Access here for the NVIDIA document site.

Access here for GPU-related information site for the PGI distributor.

On nodes equipped with GPU, manuals in pdf or html format are assigned under the following directories. Please refer to these also.

  firefox /usr/local/cuda/doc/index.tml
  evince /usr/local/cuda/doc/(target pdf file name)

Java environment

In NIG Super, multiple versions of JDK are installed. The installation directory is

  /usr/local/pkg/java

Multiple versions of JDKs are installed under this path, and the path is provided for one of the versions. At present, the default path is provided for the latest version of the JDK1.8 series. Please set the path in your environment if you wish to use other versions.

Precautions for using Java are provided here. Since Java decides on the heap domain size at startup by looking at the amount of physical memory the node is equipped with, Java cannot be started as it is when it tries to obtain a heap domain larger than the upper virtual memory limit of 4 GB specified by UGE. To use Java, please specify the maximum heap size option “-Xmx”:

[username@t266 ~]$ java -version
Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.
[username@t266 ~]$ java -Xmx512m -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

When you need a large heap domain, please specify the amount of memories used by “-l s_vmem” and “-l mem_req” options and execute qlogin at the login node.

[username@gw ~]$ qlogin -l s_vmem=8G -l mem_req=8G
(Omitted)
[username@t266 ~]$ java  -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

Script language environment

The following script languages are available. Please note that the versions will be updated as necessary.

LanguageVersionInstallation path
ruby 1.9.3p125 /usr/local/bin/ruby
python 2.7.2 /usr/local/bin/python
perl 5.14.2 /usr/local/pkg/perl

 

Script language module

While basic modules have been provided as necessary for each script language, it is difficult to satisfy all the demands from all our users as they are diverse in overall and some are difficult to verify and maintain. Since the users are not prohibited from individually installing the modules in their own home directory at this site, please cooperate and take individual measures if they can be handled at your discretion.