CFD SIMULATIONS: RUNNING WITH MORE CORES MAY NOT BE FASTER
Nowadays, large Computational Fluid Dynamics (CFD) simulations spanning from dozens to hundreds of millions of cells are carried out commonly to gain insights of fluid dynamics physics for a particular problem. For example, a hydrodynamics modeling of waves around a giant ship hull can be performed for drag analysis. Parallel computing with hundreds or even thousands of CPU cores, therefore, are required to run these simulations, so the results can be obtained in a shorter turnaround time.
Ideally, the more computing cores there are to run a simulation in parallel mode, the shorter is the time required in obtaining the solution. In reality, however, the computing time may not be shortened when a large number of computing cores are used for the simulation due to the increased overhead introduced by the communication and synchronization among many cores. Therefore, benchmarking of the parallel performance is recommended to find out the optimal number of cores for the simulation. This not only helps to determine the right number of cores for the shortest time to solve; it also ensures the best utilisation of computing resources.
Benchmark Test
Using the latest installed Atlas8 HPC cluster, consisting of 24-core based compute nodes, a benchmark was carried out on 4, 8, 12, 16 and 18 nodes to search for the most suitable number of cores that offer an optimal performance. The benchmark test is to run a CFD simulation of a problem sized 37 million cells, which is similar to solving 37 million sets of algebraic equations.
The average computing time per iteration versus the number of compute nodes or cores is plotted in Fig. 2. The computing time per iteration decreases with more number of compute nodes that are used from 4, to 8, 12 and 16. However, it increases when 18 compute nodes are used, which means the actual time-to-solution has become longer. Unfortunately, the benchmark test was not performed on more compute nodes due to resource constraints.
Recommendation
The following can be observed from this benchmark test:
• The simulation can be completed in the shortest time using 16 compute nodes (or 384 cores), which is 2.7 times faster than using 4 compute nodes. When using 12 compute nodes, it is 2.4 times faster.
• The computation will become slower when 18 or more compute nodes are used.
• Perform similar benchmark tests to find the optimal number of nodes and cores in achieving shorter time-to-solution, best productivity and utilisation of computing resources. Having more nodes or core do not necessarily mean faster results.
• Assess and balance the value of investing in more compute nodes/cores verses the benefits that can be derived. For example, evaluate the feasibility of investing in 4 more compute nodes to increase the computing force from 12 to 16 nodes (33% more investment) that results in 10% savings of computing time.