Tackling HPC issue for parallel computing in MATLAB
by Vamshidhar Gangu, Research Computing, NUS Information Technology
Introduction
Here, we address a potential problem that occurs while using Parallel Computing Toolbox (PCT) on NUSIT-HPC. This problem occurs when submitting multiple jobs on PBS using Parallel computing toolbox. Most of these jobs would crash and the usual scenario is that the first job would run, while the subsequent jobs would hang/crash indicating that a second matlabpool cannot be opened.
Root Cause of the problem
When using Parallel Computing Toolbox (PCT), MATLAB creates a matlabpool for each job separately and when you submit multiple PCT jobs, these matlabpool that get created have the ability to interfere with one another which can lead to errors and early termination of your scripts.
The Parallel Computing Toolbox (PCT) requires a temporary “Job Storage Location” where it stores information about matlabpool that is in use. This is simply a directory on the file system that MATLAB writes various files to coordinate the parallelization of the matlabpool. By default, this information is stored in “/home/svu/YOURUSERNAME/.matlab/”. When multiple PCT jobs are submitted to the job scheduler (PBS), all jobs will attempt to use this default location for storing job information and thereby create a race condition where one job modifies the files created by other jobs. This situation must be avoided.
Solution
The solution is to have a unique Job Storage location for each PCT job. For this, a temporary directory must be created before launching MATLAB in our job submission script and inside your MATLAB script, the matlabpool must be created to explicitly use this unique temporary directory. An example job submission script is shown in the box below. As good housekeeping practise, this temporary directory can be purged after the MATLAB script is run.
#!/bin/bash |
And the corresponding MATLAB script (my_matlab_prog.m) needs to include these lines:
% create a local cluster object |
Please contact us via nTouch, if you need help with your HPC issues.