PERFORMANCE OF THE NEW ATLAS8 CLUSTER
The latest addition to the HPC resources here in NUS Information Technology is the new Atlas8 cluster, installed with 2-socket Intel Xeon 2.27GHz E5-2650 v4 processors, with 12-cores in each CPU socket. A benchmarking run was done to compare the performance of this processor against our previous cluster, Atlas7, which has 2-socket Intel Xeon 2.67GHz X5650 processors, with 6-cores in each socket.
In this article, we would be sharing some basic tips to help improve the speed and reduce the storage demands of your NGS applications.
Abaqus benchmark
In this comparison, we ran the ABAQUS benchmark titled “Contour Integral Evaluation: three-dimensional case”, specifically, the “Semi-elliptical crack in a rectangular plate” problem. This problem is available in the ABAQUS distribution, and can be downloaded using the “abaqus fetch” command. The following were fetched:
abaqus fetch job=jktintegral3d.inp abaqus fetch job=jktintegral3d_element.inp abaqus fetch job=jktintegral3d_node.inp abaqus fetch job= contourintegral_ellip_plate_xfem_c3d4.inp abaqus fetch job= contourintegral_ellip_plate_xfem_c3d8.inp
and the analysis was done by running the command:
abaqus job=jktintegral3d.inp double
The results of the benchmark can be seen in Plot 1 below.
RESULTS AND CONCLUSIONS
The comparison between the performances of Atlas8 and Atlas7 are shown in Plot 1. The results show that Atlas8 consistently outperforms Atlas7 in the benchmark runs, and the speed-ups range from 2 times (for 24-core jobs) to 5.6 times (for 12-core jobs) (see Tables 1 and 2 above).
The third columns of Tables 1 and 2 show the total CPU time used by the processors to run the jobs. Because Atlas8 has 24 cores within the same system, it is clear that the parallel jobs were able to run more optimally within the same system. There seemed to be less overhead as compared to Atlas7, which saw an increase in the overheads because the Atlas7 systems have only 12-cores within the same node. The larger 22-core and 24-core jobs were run across two Atlas7 nodes, indicating that both CPU time and wall clock times improved once the load was spread across the systems.
In Table 1, we can see that for this particular benchmark test, the wall clock time improvements started to taper off from 12 cores onwards, and the analysis runs stayed at around 200 seconds even when more cores were used. In conclusion, the performance improvements from using more processors beyond 12 cores were marginal. For such cases, it may be more efficient to run two 12-core jobs instead of a single 24-core job.
Each computational analysis may differ in the amount of time needed to complete the simulation, but it can be observed that in many analysis jobs, their characteristics are similar to the one above: the performance improvements from using more processor cores to run a job decreases as the number of cores increases. It is therefore prudent for each HPC user to find out the optimal number of processors to run the analysis in a similar benchmark exercise, to get the results faster and to complete the project in a shorter duration.