Figure 6, 26 analysis and recommendations chapter 3 – AMD ATHLON 64 User Manual

Page 26

Advertising
background image

26

Analysis and Recommendations

Chapter 3

40555

Rev. 3.00

June 2006

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™
ccNUMA Multiprocessor Systems

Threads firing at each other (crossfire)

The first thread runs on node 0 and writes to memory on node 1 (1 hop). The second thread runs
on node 1 and writes to memory on node 0 (1 hop).

In each case, the two threads are run on core 0 of whichever code they are running on. The system is
left idle except for the two threads. As shown in Figure 6 on page 26, the crossfire 1 hop-1 hop case is
the worst performer.

Figure 6.

Crossfire 1 Hop-1 Hop Case vs No Crossfire 1 Hop-1 Hop Case on an Idle

System

When the write-only threads fire at each other (crossfire), the bidirectional HyperTransport link
between node 0 and node 1 is saturated and loaded at 3.5 GB/s in each direction. The theoretical
maximum bandwidth of the HyperTransport link is 4 GB/s in each direction. Thus, the utilization of
the bidirectional HyperTransport link is 87% (3.5 ÷ 4) in each direction on that HyperTransport link.

On the other hand, when the write-only threads do not fire at each other (no crossfire), the utilization
of the bidirectional link from node 0 to node 1 is at 60% in each direction. In addition, the utilization
of the bidirectional link from node 1 to node 3 is at 54% in each direction. Since the load is now
spread over two bidirectional HyperTransport links instead of one, the performance is better.

The saturation of these coherent HyperTransport links is responsible for the poor performance for the
crossfire case compared to the no crossfire case. For detailed analysis, refer to Section A.2 on
page 40.

In this synthetic test, read-only threads do not result in poor performance. Throughput of such threads
is not high enough to exhaust the HyperTransport link resources. When both threads are read-only,
the crossfire case is equivalent in performance to the no crossfire case.

It is also useful to study whether this observation holds on a system that is not idle. The following
analysis explores the behavior of the two foreground threads under a variable background load.

Total Time for both threads (write-write)

113%

130%

149%

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0.0.w.0 1.0.w.1 (0 Hops) (0 Hops)

0.0.w.1 1.0.w.3 (1 Hops) (1 Hops)

0.0.w.1 1.0.w.0 (1 Hops) (1 Hops)

0 Hop
0 Hop

1 Hop
1 Hop

NO

Xfire

1 Hop
1 Hop
Xfire

Advertising