Software prefetch scheduling distance, Software prefetch scheduling distance -25 – Intel ARCHITECTURE IA-32 User Manual

Page 315

Advertising
background image

Optimizing Cache Usage

6

6-25

Balance single-pass versus multi-pass execution

Resolve memory bank conflict issues

Resolve cache management issues

The subsequent sections discuss all the above items.

Software Prefetch Scheduling Distance

Determining the ideal prefetch placement in the code depends on many
architectural parameters, including the amount of memory to be
prefetched, cache lookup latency, system memory latency, and estimate
of computation cycle. The ideal distance for prefetching data is
processor- and platform-dependent. If the distance is too short, the
prefetch will not hide any portion of the latency of the fetch behind
computation. If the prefetch is too far ahead, the prefetched data may be
flushed out of the cache by the time it is actually required.

Since prefetch distance is not a well-defined metric, for this discussion,
we define a new term, prefetch scheduling distance (PSD), which is
represented by the number of iterations. For large loops, prefetch
scheduling distance can be set to 1, that is, schedule prefetch
instructions one iteration ahead. For small loop bodies, that is, loop
iterations with little computation, the prefetch scheduling distance must
be more than one iteration.

A simplified equation to compute PSD is deduced from the
mathematical model. For a simplified equation, complete mathematical
model, and methodology of prefetch distance determination, refer to
Appendix E, “Mathematics of Prefetch Scheduling Distance”.

Example 6-3 illustrates the use of a prefetch within the loop body. The
prefetch scheduling distance is set to 3,

esi

is effectively the pointer to a

line,

edx

is the address of the data being referenced and

xmm1-xmm4

are

the data used in computation. Example 6-4 uses two independent cache

Advertising