Figure 6-7 – Intel ARCHITECTURE IA-32 User Manual

Page 326

Advertising
background image

IA-32 Intel® Architecture Optimization

6-36

Figure 6-7 shows how prefetch instructions and strip-mining can be
applied to increase performance in both of these scenarios.

For Pentium 4 processors, the left scenario shows a graphical
implementation of using

prefetchnta

to prefetch data into selected

ways of the second-level cache only (SM1 denotes strip mine one way
of second-level), minimizing second-level cache pollution. Use

prefetchnta

if the data is only touched once during the entire

execution pass in order to minimize cache pollution in the higher level
caches. This provides instant availability, assuming the prefetch was
issued far ahead enough, when the read access is issued.

Figure 6-7

Examples of Prefetch and Strip-mining for Temporally Adjacent and
Non-Adjacent Passes Loops

Temporally

non-adjacent passes

Temporally

adjacent passes

Prefetchnta

Dataset A

Reuse

Dataset A

Reuse

Dataset B

Prefetchnta

Dataset B

SM1

SM1

Prefetcht0

Dataset A

Prefetcht0

Dataset B

Reuse

Dataset B

Reuse

Dataset A

SM2

Advertising