HPC CHALLENGE STREAM BENCHMARK PERFORMANCE ON AWS CLOUD
Amazon Web Services (AWS) had recently introduced a new compute optimized instance type, the C5 instances in their Amazon EC2 catalogue. The C5 is powered by 3.0 GHz Intel® Xeon® Scalable (Skylake) processors, and comes in 6 sizes as shown in the table below:
Instance Name | vCPUs | RAM | EBS Bandwidth | Network Bandwidth |
---|---|---|---|---|
c5.large | 2 | 4 GiB | Up to 2.25 Gbps | Up to 10 Gbps |
c5.xlarge | 4 | 8 GiB | Up to 2.25 Gbps | Up to 10 Gbps |
c5.2xlarge | 8 | 16 GiB | Up to 2.25 Gbps | Up to 10 Gbps |
c5.4xlarge | 16 | 32 GiB | 2.25 Gbps | Up to 10 Gbps |
c5.9xlarge | 36 | 72 GiB | 4.5 Gbps | 10 Gbps |
c5.18xlarge | 72 | 144 GiB | 9 Gbps | 25 Gbps |
Table 1: C5 instance type sizes |
We set out to test the performance of the new C5 (Skylake) instances by running the Matlab HPC Challenge Stream benchmark (hpccStream.m) provided by the Matlab software. To enable the software to run in the cloud, the M-file needs to be compiled using our Matlab Compiler Toolbox, to generate a portable version of the software that can be run without drawing any Matlab licenses.
The performance of the C5 is compared to our latest HPC cluster in NUS IT, atlas8, which is powered by 2.4 GHz Intel® Xeon® E5-2640 v4 (Broadwell) processors, with a total of 24 cores in a single system.
Elapsed time by Number of CPUs (sec) | 6 CPUs | 12 CPUs | 16 CPUs | 24 CPUs | 36 CPUs |
atlas8 | 7.153 | 5.119 | 4.740 | 4.679 | 5.916# |
---|---|---|---|---|---|
c5.9xlarge | 5.835 | 3.847 | 3.471 | 3.470 | 3.736 |
% improvement | 18.4% | 24.8% | 26.8% | 25.8% | 36.8% |
# On atlas8, the 36-cpu run was done within a 24-core system. ## Lower values indicate better performance. |
|||||
Plot 1:Performance of C5 instance against atlas8 |
Results
The plot above shows the comparison of benchmark results between atlas8 and C5. Some observations are:
- For atlas8, the performance is poor after 24 CPUs because running more than 24 threads within the 24-core system overloads the system.
- The average speed improvement between the Broadwell and Skylake processors as shown in the Stream benchmark above is 24% (we exclude the data for 36 CPUs, because of (1) above).
- Even though the total number of vCPUs in c5.9xlarge is 36, the speed-up peaks around 25 CPUs. Beyond that, having more cores does not help to improve the performance. On the contrary, the performance deteriorates beyond 25 CPUs.
Overall, the newer Skylake system in C5 runs better. According to the AWS website, their C5 (Skylake) instances ae up to 25% faster than their previous version, C4, powered by Intel Xeon Haswell processors (Haswell is 2 generations older than Skylake, while our Broadwell is 1 generation before Skylake). As such, our benchmark runs exceeds even the performance reported by AWS.
Finally, the AWS C5 instances also come in sizes that are bigger than our atlas8 systems, with instances having up to 72 vCPUs (in general 2 vCPUs = 1 physical CPU).
Conclusion
The AWS cloud resources, and most cloud service providers in general, can offer the latest and best of the computational hardware for their customers. Their pay-as-you-use model also makes it economical to use computational resources if these are needed only occasionally, for example, during a computationally intensive phase of a project. If managed properly, computational costs for a project can be reduced by eliminating wastage of idle resources on premise.
In NUS IT, our main computational clusters are still on-premise. However, with access to the larger variety of computational resources in the cloud and in NSCC, we will be able to cater to specific needs of researchers that cannot be met within our on-premise HPC clusters.
For more information, email us at NUSIT-HPC@nus.edu.sg.