Prefetch and load instructions – Intel ARCHITECTURE IA-32 User Manual

Page 298

Advertising
background image

IA-32 Intel® Architecture Optimization

6-8

The Prefetch Instructions – Pentium 4 Processor
Implementation

Streaming SIMD Extensions include four flavors of

prefetch

instructions, one non-temporal, and three temporal. They correspond to
two types of operations, temporal and non-temporal.

The non-temporal instruction is

prefetchnta

Fetch the data into the second-level cache, minimizing
cache pollution.

The temporal instructions are

prefetcht0

Fetch the data into all cache levels, that is, to the
second-level cache for the Pentium 4 processor.

prefetcht1

Identical to

prefetcht0

prefetcht2

Identical to

prefetcht0

Prefetch and Load Instructions

The Pentium 4 processor has a decoupled execution and memory
architecture that allows instructions to be executed independently with
memory accesses if there are no data and resource dependencies.
Programs or compilers can use dummy load instructions to imitate
prefetch functionality, but preloading is not completely equivalent to
prefetch instructions. Prefetch instructions provide a greater
performance than preloading.

NOTE.

At the time of

prefetch

, if the data is already

found in a cache level that is closer to the processor
than the cache level specified by the instruction, no
data movement occurs.

Advertising