The following three types of compilers can be used on the NIG Supercomputer:
|Package name||Version||Supported language||Location of installation||Document URL (outside the site)|
|Intel Composer XE Linux||12.1.0(XE2011)||C,C++,Fortran77,90,95,2003,2008||/opt/intel||URL|
|PGI Accelerator CDK Cluster Development Kit||11.10||C,C++,FORTRAN77,90,95,2003,HPF||/opt/pgi||URL|
How to use the Intel compiler
The 64-bit environment version 12.1.0 20110811 can be used in the user’s default environment. The command names for each language compiler are as follows:
The main options that can be used are as follows. For details, please refer to the documents placed at Developer’s document site.
|-fast||Maximizes the overall speed of the program.|
|-O1||Optimizes with consideration of the size. Optimization with tendency to increase the object size is omitted.|
|-O2||Executes optimization. (Default setting) Executes many optimization processes to improve vectorization and execution speed.|
|-O3||Executes strong loop optimization and memory access optimization such as scalar replacement, loop unrolling, code repetition to remove branching, loop blocking that uses cache efficiently, and data prefetching function in addition to -O2 optimization.|
|-openmp||Generates multi-threaded code according to instructions if there are OpenMP instruction lines. It may be necessary to increase the stack size.|
|-xtarget||Generates special code for Intel processors supporting the command set specified by target. The execution file cannot be executed by processors that are not manufactured by Intel or Intel processors that support lower command sets. The target values are listed below (in order of higher command set level): AVX, SSE4.2, SSE4.1, SSSE3, SSE3, SSE2|
|-xHost||Generates code that utilizes the highest command set available on the host processor to compile it.|
|-ipo||Executes optimization between in-line expansion and other procedures on multiple source files. For the n argument in option, the maximum number of object files to be generated during compilation is specified. The default n value is zero. The compilation time and code size may increase dramatically depending on the conditions.|
Furthermore, the recommended option for programs for which debugging has already been completed regarding the Intel compiler and that have been confirmed to deliver correct results is -fast. -fast is expanded for the following optimization options:
-ipo -O3 -no-prec-div -static -xHost
It is possible that compilation fails, that the results vary, or that the program terminates abnormally depending on the type of optimization. Please compile without optimization first and check the operation and results to increase the optimization levels in steps.
What also needs to be noted is that the Fat and Medium compute nodes and the Thin compute node vary in the corresponding expansion command level (see Hardware configuration). While the above -xHost option executes optimization by automatically judging the corresponding expansion command level, the codes that are optimized and compiled on the Thin node may not operate on the Fat/Medium compute node with the following message output:
Fatal Error: This program was not built to run in your system. Please verify that both the operating system and the processor support Intel(R) AVX.
Therefore, please compile by explicitly specifying the expansion command set as -xsse4.2 without specifying the -xHost option if you wish to execute optimization compilation for Fat/Medium compute nodes on the login node (Thin compute node), although it may be a bit bothersome.
How to use the PGI compiler
The 64-bit environment version 11.10 can be used. The compiler commands for each language are as follows:
GPGPU-supporting compiler pgfortran,pgcc For basic PGI compiler options, please refer to the following page at the distributor’s site:
Access here for the list of options for the PGI compiler.
Manuals in pdf and html formats are provided under the following directories in the system: Please refer to them as follows (X Window environment is required):
evince /opt/pgi/linux86-64/current/doc/pgi11ug.pdf firefox /opt/pgi/linux86-64/current/doc/index.htm
How to use GCC
The compiler command names for each language under the GNU compiler environment are as follows:
Programming tools and scientific computing library
The NIG Cluster can use the following parallel programming environments:
|Package name||Bundle product||Location of installation||Developer’s document site (URL)|
|Intel Cluster Studio||Intel MKL,Intel IPP,Intel TBB,Intel MPI,Intel Trace Analyzer/Collector||/opt/intel||URL|
|PGI Accelerator CDK Cluster Development Kit||PGDBG,PGPROF,ACML||/opt/pgi||URL|
An example of the command line instruction used when making a dynamic link to the parallel version of MKL is shown below. In the default environment of this system, the path to the library and included file is provided.
ifort -lmkl_intel -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
For details about each tool, please see the Distributor’s document site.
Furthermore, for GNU environments, Gnu Scientific Library (GSL) is installed as an open source software program (at /usr/local/pkg/XXX).
For the MPI library, the combination of OpenMPI + Intel Compiler (64-bit) and the combination of OpenMPI + gcc (64-bit) are currently available, with the combination of OpenMPI+Intel Compiler set as the default for the user environment. The OpenMPI version currently available in the system as of April 2012 is 1.4.4. Please use the following commands for compilation of MPI programs under this environment:
To compile an MPI program using gcc during open-source compilation, please use the following compilation commands:
The Intel compiler optimization options are also available at startup for these commands. In this case, the MPI library will fail to link and an error is output if the optimization option “-fast” is specified for linkage to prepare the final object file, since the current OpenMPI library is prepared only as a shared library.
ld: cannot find -lmpi_cxx
This is because the -fast option is expanded to the following options as the optimization options and the “static” option to specify static linkage is included in the option:
-ipo -O3 -no-prec-div -static -xHost
To prevent this error, specify the options as follows by excluding -static from the options expanded from -fast at the stage of linkage to link to the MPI library and prepare the execution object instead of specifying -fast:
-ipo -O3 -no-prec-div -xHost
PGPROF included in PGI CDK is available as a profiler. While it is expected that the profiler will be executed on a login node, the precautions described for Java environment become relevant as pgprof is a Java application. Please set up the environment with reference to them before starting up pgprof. Please also refer to this URL for details about the PGPROF functions. In addition, refer to How to use the X Client software on login nodes if you wish to use a GUI environment (X-Window System) with PGPROF.
GPGPU programming environment
GPGPU can be used in parts of the Thin compute nodes (month_gpu.q, parts of debug.q and parts of login.q). The available GPGPU is Tesla M2090, and one unit of it is installed in one X unit of GPU-mounted node. On nodes where GPU can be utilized, the CUDA driver and the CUDA tool kit are available.
|CUDA tool kit||4.1, V0.2.1221|
To develop GPGPU programs, a node equipped with GPGPU is assigned for parts of the login node. Please specify the “-l gpu” option and execute qlogin at the login node.
[username]@gw ~]$ qlogin -l gpu Your job 7345711 ("QLOGIN") has been submitted waiting for interactive job to be scheduled ... Your interactive job 7345711 has been successfully scheduled. Establishing /home/geadmin/UGER/utilbin/lx-amd64/qlogin_wrapper session to host t140i ... Warning: Permanently added '[t140i]:58350,[172.19.0.140]:58350' (RSA) to the list of known hosts. username@t140i's password: Last login: Mon Jun 25 11:32:46 2012 from t351i
On completing login, use the pgaccelinfo command and check the conditions of each environment.
[username@t140 ~]$pgaccelinfo CUDA Driver Version: 4020(4.2) NVRM version: NVIDIA UNIX x86_64 Kernel Module 295.36 Sun Apr 1 21:30:56 PDT 2012 Device Number: 0 Device Name: Tesla M2090 Device Revision Number: 2.0 Global Memory Size: 5636554752 Number of Multiprocessors: 16 Number of Cores: 512 Concurrent Copy and Execution: Yes Total Constant Memory: 65536 Total Shared Memory per Block: 49152 Registers per Block: 32768 Warp Size: 32 Maximum Threads per Block: 1024 Maximum Block Dimensions: 1024, 1024, 64 Maximum Grid Dimensions: 65535 x 65535 x 65535 Maximum Memory Pitch: 2147483647B Texture Alignment: 512B Clock Rate: 1301 MHz Initialization time: 33353 microseconds Current free memory: 5556396032 Upload time (4MB): 1488 microseconds ( 756 ms pinned) Download time: 22393 microseconds ( 672 ms pinned) Upload bandwidth: 2818 MB/sec (5548 MB/sec pinned) Download bandwidth: 187 MB/sec (6241 MB/sec pinned)
Further, check whether nvcc can be used, and what version it is.
[username@t140 ~]$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2011 NVIDIA Corporation Built on Thu_Jan_12_14:41:45_PST_2012 Cuda compilation tools, release 4.1, V0.2.1221
For example, compilation using nvcc is executed as follows:
[username@t140 ~]$nvcc -o sample1 sample1.cu
When the execution module has been prepared, conduct a test operation (do not go beyond the test operation on the login node in this case. Conduct full-scale computation on the compute node).
[username@t140 ~]$ sample1
If it operates, prepare the job script and enter it in the month_gpu queue. If the job script name is set provisionally as sample.sh, it is expressed as follows:
[username@t140 ~]$qsub -l month -l gpu sample.sh
For details, please refer to the CUDA manual, the manual for the PGI compiler, and so forth.
Access here for the NVIDIA document site.
Access here for GPU-related information site for the PGI distributor.
On nodes equipped with GPU, manuals in pdf or html format are assigned under the following directories. Please refer to these also.
firefox /usr/local/cuda/doc/index.tml evince /usr/local/cuda/doc/(target pdf file name)
In NIG Super, multiple versions of JDK are installed. The installation directory is
Multiple versions of JDKs are installed under this path, and the path is provided for one of the versions. At present, the default path is provided for the latest version of the JDK1.8 series. Please set the path in your environment if you wish to use other versions.
Precautions for using Java are provided here. Since Java decides on the heap domain size at startup by looking at the amount of physical memory the node is equipped with, Java cannot be started as it is when it tries to obtain a heap domain larger than the upper virtual memory limit of 4 GB specified by UGE. To use Java, please specify the maximum heap size option “-Xmx”:
[username@t266 ~]$ java -version Error occurred during initialization of VM Could not reserve enough space for object heap Could not create the Java virtual machine. [username@t266 ~]$ java -Xmx512m -version java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
When you need a large heap domain, please specify the amount of memories used by “-l s_vmem” and “-l mem_req” options and execute qlogin at the login node.
[username@gw ~]$ qlogin -l s_vmem=8G -l mem_req=8G (Omitted) [username@t266 ~]$ java -version java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
Script language environment
The following script languages are available. Please note that the versions will be updated as necessary.
Script language module
While basic modules have been provided as necessary for each script language, it is difficult to satisfy all the demands from all our users as they are diverse in overall and some are difficult to verify and maintain. Since the users are not prohibited from individually installing the modules in their own home directory at this site, please cooperate and take individual measures if they can be handled at your discretion.