Figure 6-1 – Intel ARCHITECTURE IA-32 User Manual

Page 312

Advertising
background image

IA-32 Intel® Architecture Optimization

6-22

Example of Latency Hiding with S/W Prefetch Instruction

Achieving the highest level of memory optimization using prefetch
instructions requires an understanding of the microarchitecture and
system architecture of a given machine. This section translates the key
architectural implications into several simple guidelines for
programmers to use.

Figure 6-2 and Figure 6-3 show two scenarios of a simplified 3D
geometry pipeline as an example. A 3D-geometry pipeline typically
fetches one vertex record at a time and then performs transformation
and lighting functions on it. Both figures show two separate pipelines,
an execution pipeline, and a memory pipeline (front-side bus).

Since the Pentium 4 processor, similarly to the Pentium II and
Pentium III processors, completely decouples the functionality of
execution and memory access, these two pipelines can function
concurrently. Figure 6-2 shows “bubbles” in both the execution and
memory pipelines. When loads are issued for accessing vertex data, the

Figure 6-1

Effective Latency Reduction as a Function of Access Stride

U p p e r b o u n d o f P o in t e r - C h a s in g L a t e n c y R e d u c tio n

0 %

2 0 %

4 0 %

6 0 %

8 0 %

1 0 0 %

1 2 0 %

64

80

96

11

2

12

8

14

4

16

0

17

6

19

2

20

8

22

4

24

0

S tr i d e (B y te s)

E

ffe

c

ti

v

e

L

a

te

n

c

y

Re

d

u

c

ti

o

n

F a m . 1 5 ; M o d e l 3 , 4

F a m . 1 5 ; M o d e l 0 , 1 , 2

F a m . 6 ; M o d e l 1 3

F a m . 6 ; M o d e l 1 4

F a m . 1 5 ; M o d e l 6

Advertising