HOW WELL ARE WE DOING HPC?
The current HPC clusters provided by Computer Centre allow the use of up to 48 CPU cores to run parallel jobs. Take a look at the statistics below to see how well you have been using those parallel computing resources.
Table 1 shows that about 40% of the jobs monitored achieved no speedup at all (represented by blue bar). Around 50% achieved speedup up to 8 times (orange bar), and more than 10% managed to get more than 8 times speedup over the past four months. Out of the 10% jobs achieved more than 8 times speedup, about half of them were in the 8-12 times speedup range. Overall, around 1% of total jobs achieved more than 32 time speedup. With 30,000 or more jobs completed per month, 1% is equivalent to around 300 jobs.
What can you do about it?
There are two possible explanations of why majority of parallel jobs are achieving speedup in the range of 1-12 times. Firstly, all our cluster nodes have either 8 cores or 12 cores. As it is more efficient to run jobs within the node, it is possible that some users confine their job execution within the node. It is also highly possible that most jobs can scale well only up to around 12 cores. As we shared in the previous issue of HPC@NUS, asking for more cores than what the job can use likely will lead to longer queuing and turnaround time.
For the 1% jobs that had speedup of more than 32 times, there is a possibility that those applications are very scalable but may be constrained by the number of cores they can use. If you are one of those users, you may consider subscribing to the on-demand HPC services (Pay-Per-Use or Condominium service) or making use of the National Supercomputing Centre (NSCC) resources (check out the article written by Junhong) to further improve your parallel computation speedup. The on-demand HPC services offer up to hundreds of cores for short-term ownership while NSCC provides up to thousands of cores per job through fair-share usage.