NUSIT HPC

Home » HPC & GPU Systems

HPC-AI Systems

Volta HPC-AI GPU Cluster is our latest GPU cluster for GPU-accelerated HPC-AI workloads. The cluster contains 9 nodes with 2x Intel(R) Xeon(R) Gold 6148 CPU and 4x Nvidia Tesla V100-SXM2-32GB GPUs in each node.

Number Model Processors
Dell C4410
Model
Count
CPU
Intel(R) Xeon(R) Gold 6148 CPU
2
GPU
Nvidia Tesla V100-SXM2-32GB
4
RAM
-
376GB
Storage
Local scratch space
5.6T

The Volta HPC-AI GPU Cluster is supported by the following network storage:

Number Model Processors
Total Space
Per User
NAS (NFS)
-
500GB
Parallel Filesystem
109T
1TB

GPU Articles

New GPU system at NUS HPC
Using GPU in Matlab Parallel Computing Toolbox
For problems or queries, please contact us at ccehpc@nus.edu.sg.

Please note the following:

To access gold, you can either SSH directly to gold-c01 (172.25.192.196), or through HPC portal: after logging in to HPC portal, click gold-c01 cluster to start a xterm.

Only codes that use the GPU should be run on the gold cluster. The batch queue for gold cluster is gpu.

Job descriptionQueue NameMemory LimitRuntime Limit
GPU-enabledgpu46 GBUnlimited

Codes for GPU written in CUDA may be compiled using the nvcc compiler (currently version 4.2):

prompt>  nvcc -o Helloworld.exe Helloworld.cu

To submit a GPU-enabled code, prepare the following job submission script and then submit it via qsub command. Enter “hpc pbs help” to list out the job submission steps.

#!/bin/bash

#PBS -P Project_Name_of_Job
#PBS -q gpu
#PBS -l select=1:ncpus=6:ngpus=1
#PBS -j oe
#PBS -N Job_Name
###  -N Job_Name: set filename for standard output/error message.

cd $PBS_O_WORKDIR;   ## This line is needed, do not modify.

##--- Put your exec/application commands below ---
./Helloworld.exe

Please refer to sections below for information on submission for specific GPU-enabled applications and further help on GPU programming environment.

The high performance parallel file system (/hpctmp2) is accessible from the gold cluster. Users are advised to submit and run their batch jobs from this file system. More details on the high performance work space may be found here.

The command deviceQuery checks for available GPU devices. The command identifies two GPU devices (0 and 1) which may be specified in your commands:

prompt> deviceQuery

CUDA Device Query (Runtime API) version (CUDART static linking)
Found 2 CUDA Capable device(s)
Device 0: “Tesla M2090”
CUDA Driver Version / Runtime Version 4.2 / 4.2

Device 1: “Tesla M2090”
CUDA Driver Version / Runtime Version 4.2 / 4.2

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.2, CUDA Runtime Version = 4.2, NumDevs = 2, Device = Tesla M2090, Device = Tesla M2090
[deviceQuery] test results…
PASSED

Enabling applications for GPUs is often a joint task undertaken by GPU and application developers. NVDIA, a major GPU manufacturer, has taken the lead in working with developers for many HPC applications such as Gaussian, MATLAB, R and a whole suite of applications in Computational Biology such as Amber, Gromacs, LAMMPS, NAMD, VMD, BLAST, and HMMER. GPU-enabled versions of NAMD, ABAQUS and MATLAB, have been installed on gold.

MATLAB

The Parallel Computing User’s guide (section 10) explains how to use MATLAB GPU primitives and execute this GPU-enabled MATLAB code on the GPU. It also shows how existing CUDA code can be integrated with MATLAB by converting them into PTX (parallel thread execution) files, and then creating and running the CUDA kernels from within MATLAB.

For NAMD, use this script to access both GPU devices 0 and 1:

#!/bin/sh
#PBS -p Project_Name
#PBS -j oe
#PBS -q gpu
#PBS -N Job_Name

cd $PBS_O_WORKDIR;   ## this line is needed, do not delete and change.
/app1/namd/NAMD_CVS-2012-01-16_Linux-x86_64-CUDA/charmrun ++local +p12 /app1/namd/NAMD_CVS-2012-01-16_Linux-x86_64-CUDA/namd2 +idlepoll +devices 0,1 tiny.namd >tiny.out

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by NVIDIA for harnessing the power of the GPU. Application developers can build GPU-accelerated applications using the CUDA Toolkit, a comprehensive development environment for C/C++. It includes an API and compiler (nvcc) for NVIDIA GPUs, libraries and tools for performance analysis.

CUDA-accelerated libraries provide a way of using CUDA without programming. These libraries are well parallelised and optimised, and applicable in a wide range of scientific and engineering domains. In the liast below, click on the library name to find out more details and sample codes.

NumberDescription
CUFFTFourier transforms
CUBLASDense linear algebra
CUSPARSESparse linear algebra
CUDA MathStandard mathematical functions
CURANDPseudo-random and quasi-random numbers
NPPPerformance primitives for image and signal processing

Also in the CUDA ecosystem are performance analysis tools for optimising the performance of applications. One such tool is the Visual Profiler, a cross-platform performance profiling tool that gives developers context and kernel level analysis for optimising CUDA C/C++ and OpenCL applications. It supports all CUDA capable NVIDIA GPUs shipped since 2006 on Linux, Mac OS X, and Windows.

The GPU Computing SDK includes 100+ code samples, utilities, whitepapers, and additional documentation to help you get started developing, porting, and optimizing your applications for the CUDA architecture.

The GPU Computing Document has guides on CUDA API, CUDA C programming, debugging using cuda-gdb and others.

Once you have written your CUDA code, compile using the nvcc compiler (currently version 4.2):

prompt>  nvcc -o helloWorld helloWorld.cu