SPEEDING UP PERFORMANCE WITH MULTICORE PROGRAMMING
New HPC users may be interested to know how to make use of such multi-core systems to maximize the benefits through parallel programming. To this end, it is important to have software that supports multithreaded programming to allow users to easily turn their codes into a parallel program that can run on a multi-core system. The good news is that many of the commercially available application software now support this feature. Additionally, many open-source applications, such as R, are able to support multithreaded programming.
In the benchmark below, the “doMC” package written for R was used to run a program in multiple threads (see http://cran.r-project.org/web/packages/doMC/vignettes/gettingstartedMC.pdf for the sample code from which the actual code was derived). The program was run on one of our newer systems (atlas7) which has 2 processors, each having 6 cores (a total of 12 cores). The run results are shown in the graph below.
Several observations can be made from the results shown:
- The elapsed time was dramatically reduced from 462.4 seconds to 100.5 seconds when running the program on 6 threads instead of 1.This shows more than 4 times reduction in the elapsed time.
- The elapsed time is reduced slightly to around 72 seconds when using 12, 18 and 24 threads.The improvements in speed-up is less significant once the number of threads matches the maximum number of cores in the system. Even though the Linux operating system is able to support more than 1 thread per core, there are no noticeable speed improvements in choosing more threads than the number of cores available.
- The total CPU time (user + system) increased as the number of threads increased.Increasing the threads also increases the overheads needed to run a multithreaded program (see http://randu.org/tutorials/threads to understand more about multithreaded programming).
- For the example shown, the optimal number of threads is 6.For any parallel program, the speed-up does not improve linearly as the number of threads is increased. This may be due to unparallelized portions of the code, and other overheads. The link in item 3 above has an explanation of this (see section on Amdahl’s Law and the Pareto Principle). Therefore, the programmer needs to find the optimum number of threads that gives the maximum speed-up for any program. Choosing more threads does not necessarily mean that the program will gain the proportionate improvements in speed-up.
Conclusion
There are now many types of software that can make use of more than one processor core to speed up a program through multithreaded programming. In using multithreaded programming, the programmer needs to find the optimum number of threads for his particular problem, for maximum speed-up gains.