1 keeping data local by virtue of first touch, Keeping data local by virtue of first touch, Figure 5 – AMD ATHLON 64 User Manual

Page 22: 22 analysis and recommendations chapter 3

Advertising
background image

22

Analysis and Recommendations

Chapter 3

40555

Rev. 3.00

June 2006

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™
ccNUMA Multiprocessor Systems

Figure 5.

Write-Only Thread Running on Node 0, Accessing Data from 0, 1 and 2

Hops Away on an Idle System

In this test case, a write access is similar to a read access as far as the coherent HyperTransport™ link
traffic or the memory traffic generated, except for certain key differences. A write access brings data
into the cache much like a read and then modifies it in the cache. However, in this particular synthetic
test case, there are several successive write accesses to sequential cache line elements in a 64-MB
array. This results in a steady state condition of cache line evictions or write-backs for each write
access. This increases the memory and HyperTransport traffic that normally occurs for a write-only
thread to almost twice that of a read-only thread. For our test bench, when a thread does local read-
only accesses, it generates almost twice the memory bandwidth load of 1.64 GB/s, and when a thread
performs local write-only accesses, it generates a memory bandwidth load of
2.98 GB/s. Not only do writes take longer than reads for any given hop distance, but they slow down
more quickly with hop distance as a result.

3.2.1

Keeping Data Local by Virtue of first Touch

In order to keep data local, it is recommended that the following principles be observed.

As long as a thread initializes the data it needs (writes to it for the first time) and does not rely on any
other thread to perform the initialization, a ccNUMA-aware OS keeps data local on the node where
the thread runs. This policy of keeping data local by writing to it for the first time is known as the
local allocation policy by virtue of first touch. This is the default policy used by a ccNUMA-aware
OS.

A ccNUMA-aware OS ensures local allocation by taking a page fault at the time of the first touch to
data. When the page fault occurs the OS maps the virtual pages associated with the data to zeroed out
physical pages. Now the data is resident on the node where the first touch occurred and any
subsequent accesses to the data will have to be serviced from that node.

T im e fo r w r ite

113 %

127 %

129%

1 4 9 %

0

0 .2

0 .4

0 .6

0 .8

1

1 .2

1 .4

1 .6

1 .8

0 .0 .w .0

0 .0 .w .1

0 .0 .w .2

0 .0 .w .3

0 Ho p

1 Ho p

1 H o p

2 Ho p

Advertising