BENCHMARKING OPEN SOURCE LAMMPS ON ATLAS6 CLUSTER
Benchmark results on atlas6 cluster for the open source LAMMPS shows that running parallel jobs using 12 threads, or running within one node, is more efficient and effective for small to media LAMMPS jobs.
LAMMPS is a molecular dynamics code which has potentials for soft and solid state materials and coarse grain systems. It is widely used in material science simulations.
Parallel executables of LAMMPS 22 March 2013 version were compiled using MVAPICH2 version 1.8 a2 on atlas6-c01. Benchmark tests run on atlas6 cluster (two Intel X5650 @2.67GHz processors on each node) with the number of threads from 1 to 48. Taking the turnaround time of each job to compare with the turnaround time of the job running with one thread, we have the speedup of the multi-threaded jobs (Table 1) . The plot of turnaround time versus the number of threads is shown in Figure 1.
Table 1: Benchmark results of jobs executed on atlas6 cluster
Figure 1, Turnaround Time versus Number of Threads
Conclusion
The speedup is not linear. With the increase of the number of threads, the parallelism efficiency drops. Although you can run parallel jobs using 12 or less threads, users are advised to run parallel LAMMPS jobs using 12 threads, which run jobs dedicatedly on one compute node. For larger jobs which run for days or weeks, you can specify using 24 or 36 threads as long as the resource permits.
While the benchmark results of LAMMPS application give us some indication on how to run parallel LAMMPS jobs more efficiently, the behaviour could be different for other applications. Users are advised to benchmark their application using different threads to find the optimal number of threads which can help you to improve your job efficiency as well as increase the resource usage.