Gold GPU cluster

Gold, a new GPU cluster, complements the HPC resources at Computer Centre. The cluster has 16 nodes, with each node comprising a dual-socket INTEL hexa-core X5650 2.66GHz processor, 48 GB of memory and two NVDIA Tesla M2090 GPUs:

Number Model Processors Memory
16 HP SL390 G7 Two INTEL X5650 @ 2.66 GHz, 6 cores 48 GB
Two NVDIA Tesla M2090 @ 650 MHz, 512 cores 6 GB

Please note the following:

1. Logging in

To access gold, you can either SSH directly to gold-c01 (, or through HPC portal: after logging in to HPC portal, click gold-c01 cluster to start a xterm.

2. Batch jobs and queues

Only codes that use the GPU should be run on the gold cluster. The batch queue for gold cluster is gpu.

Job description Queue Name Memory Limit Runtime Limit
GPU-enabled gpu 46 GB Unlimited

Codes for GPU written in CUDA may be compiled using the nvcc compiler (currently version 4.2):

prompt>  nvcc -o Helloworld.exe Helloworld.cu

To submit a GPU-enabled code, prepare the following job submission script and then submit it via qsub command. Enter “hpc pbs help” to list out the job submission steps.


#PBS -P Project_Name_of_Job
#PBS -q gpu
#PBS -l select=1:ncpus=6:ngpus=1
#PBS -j oe
#PBS -N Job_Name
###  -N Job_Name: set filename for standard output/error message.

cd $PBS_O_WORKDIR;   ## This line is needed, do not modify.

##--- Put your exec/application commands below ---

Please refer to sections below for information on submission for specific GPU-enabled applications and further help on GPU programming environment.

The high performance parallel file system (/hpctmp2) is accessible from the gold cluster. Users are advised to submit and run their batch jobs from this file system. More details on the high performance work space may be found here.

3. GPU devices

The command deviceQuery checks for available GPU devices. The command identifies two GPU devices (0 and 1) which may be specified in your commands:

prompt> deviceQuery

CUDA Device Query (Runtime API) version (CUDART static linking)
Found 2 CUDA Capable device(s)
Device 0: “Tesla M2090”
CUDA Driver Version / Runtime Version 4.2 / 4.2

Device 1: “Tesla M2090”
CUDA Driver Version / Runtime Version 4.2 / 4.2

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.2, CUDA Runtime Version = 4.2, NumDevs = 2, Device = Tesla M2090, Device = Tesla M2090
[deviceQuery] test results…

4. GPU-enabled applications

Enabling applications for GPUs is often a joint task undertaken by GPU and application developers. NVDIA, a major GPU manufacturer, has taken the lead in working with developers for many HPC applications such as Gaussian, MATLAB, R and a whole suite of applications in Computational Biology such as Amber, Gromacs, LAMMPS, NAMD, VMD, BLAST, and HMMER. GPU-enabled versions of NAMD, ABAQUS and MATLAB, have been installed on gold.


The Parallel Computing User’s guide (section 10) explains how to use MATLAB GPU primitives and execute this GPU-enabled MATLAB code on the GPU. It also shows how existing CUDA code can be integrated with MATLAB by converting them into PTX (parallel thread execution) files, and then creating and running the CUDA kernels from within MATLAB.

For NAMD, use this script to access both GPU devices 0 and 1:

#PBS -p Project_Name
#PBS -j oe
#PBS -q gpu
#PBS -N Job_Name

cd $PBS_O_WORKDIR;   ## this line is needed, do not delete and change.
/app1/namd/NAMD_CVS-2012-01-16_Linux-x86_64-CUDA/charmrun ++local +p12 /app1/namd/NAMD_CVS-2012-01-16_Linux-x86_64-CUDA/namd2 +idlepoll +devices 0,1 tiny.namd >tiny.out
5. GPU programming environment

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by NVIDIA for harnessing the power of the GPU. Application developers can build GPU-accelerated applications using the CUDA Toolkit, a comprehensive development environment for C/C++. It includes an API and compiler (nvcc) for NVIDIA GPUs, libraries and tools for performance analysis.

CUDA-accelerated libraries provide a way of using CUDA without programming. These libraries are well parallelised and optimised, and applicable in a wide range of scientific and engineering domains. In the liast below, click on the library name to find out more details and sample codes.

Number Description
CUFFT Fourier transforms
CUBLAS Dense linear algebra
CUSPARSE Sparse linear algebra
CUDA Math Standard mathematical functions
CURAND Pseudo-random and quasi-random numbers
NPP Performance primitives for image and signal processing

Also in the CUDA ecosystem are performance analysis tools for optimising the performance of applications. One such tool is the Visual Profiler, a cross-platform performance profiling tool that gives developers context and kernel level analysis for optimising CUDA C/C++ and OpenCL applications. It supports all CUDA capable NVIDIA GPUs shipped since 2006 on Linux, Mac OS X, and Windows.

The GPU Computing SDK includes 100+ code samples, utilities, whitepapers, and additional documentation to help you get started developing, porting, and optimizing your applications for the CUDA architecture.

The GPU Computing Document has guides on CUDA API, CUDA C programming, debugging using cuda-gdb and others.

Once you have written your CUDA code, compile using the nvcc compiler (currently version 4.2):

prompt>  nvcc -o helloWorld helloWorld.cu
6. GPU articles

For problems or queries, please contact us at ccehpc@nus.edu.sg.