5 locks, Locks, Figure 15 – AMD ATHLON 64 User Manual

Page 34

Advertising
background image

34

Analysis and Recommendations

Chapter 3

40555

Rev. 3.00

June 2006

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™
ccNUMA Multiprocessor Systems

Figure 15. Both Write-Only Threads Running on Node 0 (Different Cores) under Very

High Background Load (High Subscription)

Under a very high background load, for the 0 hop-1 hop case, there is a total memory access rate of
4.78 GB/s on node 1. Several buffer queues on node 1 are saturated. For detailed analysis, refer to
section Section A.5 on page 43.

Thus, greater hop distance does not always mean slower time. Remember that it is still advised that
the developer keep the data local as much as possible. In the analogy used above, if the local queue
has 20 customers and the remote one has two, the customer would much rather have been standing in
front of the queue with two customers and make that his local queue in the first place. In the synthetic
case above, keeping the first thread on node 0 doing local writes and the second thread on node 1
doing local writes would be the fastest.

3.5

Locks

In general, it is good practice for user-level and kernel-level code to keep locks aligned to their natural
boundaries. In some hardware implementations, locks that are not naturally aligned are handled with
the mechanisms used for legacy memory mapped I/O and should absolutely be avoided if possible.

If a lock is aligned properly, it is treated as a faster cache lock. The significantly slower alternative to
a cache lock is a bus lock, which should be avoided at all costs. Bus locks are very slow and force
serialization of many operations unrelated to the lock within the processor. Furthermore bus locks
prevent the entire HyperTransport fabric from making forward progress until the bus lock completes.
Cache locks on the other hand are guaranteed atomicity by using the underlying cache coherence of
the ccNUMA system and are much faster.

Very High: Total Time for both threads (write-write)

147%

158%

158%

169%

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0.0.w.0 0.1.w.0 (0 Hops) (0 Hops)
0.0.w.0 0.1.w.1 (0 Hops) (1 Hops)

0.0.w.0 0.1.w.2 (0 Hops) (1 Hops)
0.0.w.0 0.1.w.3 (0 Hops) (2 Hops)

0 Hop
0 Hop

0 Hop
1 Hop

0 Hop
1 Hop

0 Hop
2 Hop

Advertising