The Weak Scaling of DL_POLY 3
I.J.Bush and W.Smith, STFC Daresbury Laboratory
When scaling of parallel codes is discussed it is normally strong
scaling that is being referred to, that is for a fixed system size how does the
time to solution vary with the number of processors. Weak scaling, on the other
hand, is how the time to solution varies with processor count with a fixed
system size per processor. So in a weak scaling study when one doubles the
number of processors one also doubles the system size.
Weak Scaling is most interesting for O(N) algorithms. In this
case perfect weak scaling is a constant time to solution, independent of
processor count. Deviations from this indicate that either
 The algorithm is not truly O(N) or
 The overhead due to parallelism is increasing, or both.
The linked cell algorithm employed in DL_POLY 3 [1] for the short
ranged forces should be strictly O(N) in time. Here we study the weak three
scaling of three model systems, the times being reported for HPCx, a large IBM
P690+ cluster sited at Daresbury.
In graph 1 the weak scaling for Argon is shown. The smallest
system size is 32,000 atoms, the largest 32,768,000. It can be seen that the
scaling is very good, the time step increasing from 0.6s to 0.7s on going from
1 processor to 1024. This simulation is a direct test of the linked cell
algorithm as it only requires short ranged forces, and so the results show it
is behaving as expected.
Graph1: Weak Scaling for Argon
Graph 2 shows the results for Sodium Chloride. The system size
ranges from 27,000 to 27,648,000. The weak scaling is poorer than for Argon,
with the timestep increasing from 2.8 seconds to 4.1. The reason for this is
that in this case the algorithm is no longer strictly O(N) as evaluation of the
Ewald terms is O(Nlog(N)). However the deviation is still small, since log(N)
is such a slowly increasing function.
Graph2: Weak Scaling for Sodium Chloride
Finally in graph 3 is the scaling for water. This is the poorest
scaling example, the time step increasing from 1.9 second on 1 processor, where
the system size is 20,736 particles, to 3.9 on 1024 ( system size 21,233,664 ).
Like the NaCl case ewald terms must also be calculated in this case, but over
and above that constraint forces must be calculated. This is the cause for the
larger deviation from perfect scaling; while these forces are short range and
should scale as O(N) their calculation requires a large number of short
messages to be sent, and some latency effects become appreciable. However even
here the scaling is still acceptable as a 3.9s time step is still short enough
to allow simulations to be performed.
Graph3: Weak Scaling for Water
In summary the weak scaling of DL_POLY 3 is good out to 1024
processors, though deviations are observed both due to algorithmic deviations
from O(N) behaviour and increasing parallel overhead.
I would like to thank Ilian Todorov for his assistance.
