CSE - The Weak Scaling of DL_POLY 3


The Weak Scaling of DL_POLY 3

I.J.Bush and W.Smith, STFC Daresbury Laboratory

When scaling of parallel codes is discussed it is normally strong scaling that is being referred to, that is for a fixed system size how does the time to solution vary with the number of processors. Weak scaling, on the other hand, is how the time to solution varies with processor count with a fixed system size per processor. So in a weak scaling study when one doubles the number of processors one also doubles the system size.

Weak Scaling is most interesting for O(N) algorithms. In this case perfect weak scaling is a constant time to solution, independent of processor count. Deviations from this indicate that either

  1. The algorithm is not truly O(N) or
  2. The overhead due to parallelism is increasing, or both.

The linked cell algorithm employed in DL_POLY 3 [1] for the short ranged forces should be strictly O(N) in time. Here we study the weak three scaling of three model systems, the times being reported for HPCx, a large IBM P690+ cluster sited at Daresbury.

In graph 1 the weak scaling for Argon is shown. The smallest system size is 32,000 atoms, the largest 32,768,000. It can be seen that the scaling is very good, the time step increasing from 0.6s to 0.7s on going from 1 processor to 1024. This simulation is a direct test of the linked cell algorithm as it only requires short ranged forces, and so the results show it is behaving as expected.

Graph1: Weak Scaling for Argon

Graph 2 shows the results for Sodium Chloride. The system size ranges from 27,000 to 27,648,000. The weak scaling is poorer than for Argon, with the timestep increasing from 2.8 seconds to 4.1. The reason for this is that in this case the algorithm is no longer strictly O(N) as evaluation of the Ewald terms is O(Nlog(N)). However the deviation is still small, since log(N) is such a slowly increasing function.

Graph2: Weak Scaling for Sodium Chloride

Finally in graph 3 is the scaling for water. This is the poorest scaling example, the time step increasing from 1.9 second on 1 processor, where the system size is 20,736 particles, to 3.9 on 1024 ( system size 21,233,664 ). Like the NaCl case ewald terms must also be calculated in this case, but over and above that constraint forces must be calculated. This is the cause for the larger deviation from perfect scaling; while these forces are short range and should scale as O(N) their calculation requires a large number of short messages to be sent, and some latency effects become appreciable. However even here the scaling is still acceptable as a 3.9s time step is still short enough to allow simulations to be performed.

Graph3: Weak Scaling for Water

In summary the weak scaling of DL_POLY 3 is good out to 1024 processors, though deviations are observed both due to algorithmic deviations from O(N) behaviour and increasing parallel overhead.

I would like to thank Ilian Todorov for his assistance.

Link Further information at the DL-POLY Home Page

For more information about the Advanced Research Computing Group please contact Dr Mike Ashworth.
back to top