Prefetching, Hardware instruction fetching, Software and hardware cache line fetching – Intel ARCHITECTURE IA-32 User Manual

Page 127: Prefetching -55

Advertising
background image

General Optimization Guidelines

2

2-55

Prefetching

The Pentium 4 processor has three prefetching mechanisms:

hardware instruction prefetcher

software prefetch for data

hardware prefetch for cache lines of data or instructions.

Hardware Instruction Fetching

The hardware instruction fetcher reads instructions, 32 bytes at a time,
into the 64-byte instruction streaming buffers.

Software and Hardware Cache Line Fetching

The Pentium 4 and Intel Xeon processors provide hardware prefetching,
in addition to software prefetching. The hardware prefetcher operates
transparently to fetch data and instruction streams from memory,
without requiring programmer intervention. The hardware prefetcher
can track 8 independent streams. Software prefetch using the

prefetchnta

instruction fetches 128 bytes into one way of the

second-level cache.

The Pentium M processor also provides a hardware prefetcher for data.
It can track 12 separate streams in the forward direction and 4 streams in
the backward direction. This processor’s

prefetchnta

instruction also

fetches 64-bytes into the first-level data cache without polluting the
second-level cache.

Intel Core Solo and Intel Core Duo processors provide more advanced
hardware prefetchers for data relative to those on the Pentium M
processors. The key differences are summarized in Table 1-2.

Although hardware prefetcher will operate transparently requiring no
intervention from the programmer, hardware prefetcher will operate
most efficiently if programmers specifically tailor data access patterns
to suit the characteristics of the hardware prefetcher because hardware
prefetcher favor small-stride cache miss patterns. Optimizing data

Advertising