R SPEEDUP WITH INTEL MK
The release of R-3.2.0 on April 2015 revealed some major improvement compared to previous version. Some of these changes are:
- Performance improvement on pqR (new version of R interpreter) while maintaining backward compatibility.
- Efficiency in handling big-memory data objects (e.g. cbind/rbind operation on matrices with more than 2 billion elements).
- Updates on R’s byte compiler which allow many scalar subsetting and assignment and more efficient scalar arithmetic operations.
- More stringent package-checking system to ensure packages compliance with CRAN policies.
Other than this major performance updates, R-3.2 added some minor new features and bug fixes. The current iteration of R-3.2, R-3.2.3 with codename “Wooden Christmas Tree”, was released in early December 2015 with more bug fixes and performance improvement.
Inspired by the article from Intel, we decided to do a benchmark study on R’s performance when build with compilers available in NUS HPC, GCC-4.8.2 and Intel XE 2015. Both compilation processes use the same optimization flags and are built with LAPACK library. For R build with Intel XE 2015 however, the compilation process also use Intel Math Kernel Library (Intel MKL) available in the compiler. Intel MKL is a hand optimized math library that increase application performance and reduce development. It includes highly vectorized and threaded linear algebra, fast fourier transform (FFT), vector math and statistics functions. By building R with Intel MKL, it allows R to access optimized matrix operation in Intel MKL. This improves the performance in matrix operation, which is the heart of computational data analysis.
The benchmark test is divided into three main parts, namely, matrix operation, matrix function and programming routines. The benchmark code is obtained from http://r.research.att.com/benchmarks/R-benchmark-25.R and is conducted in NUS HPC server with CentOS 6.3 operating system. Benchmarking result can be seen below:
As expected, the benchmark results for programming routine are similar for both compilers. However, the benchmark results for matrix operations and functions shows performance leap. R built with Intel XE 2015 shown a speed up between 9 to 12 times compared to GCC-4.8.2. The most significant difference can be seen in matrix cross product, where the speed up is close to 25 times faster.