Loop blocking, Loop blocking -34 – Intel ARCHITECTURE IA-32 User Manual

Page 214

Advertising
background image

IA-32 Intel® Architecture Optimization

3-34

In Example 3-19, the computation has been strip-mined to a size

strip_size

. The value

strip_size

is chosen such that

strip_size

elements of array

v[Num]

fit into the cache hierarchy. By doing this, a

given element

v[i]

brought into the cache by

Transform(v[i])

will

still be in the cache when we perform

Lighting(v[i])

, and thus

improve performance over the non-strip-mined code.

Loop Blocking

Loop blocking is another useful technique for memory performance
optimization. The main purpose of loop blocking is also to eliminate as
many cache misses as possible. This technique transforms the memory
domain of a given problem into smaller chunks rather than sequentially
traversing through the entire memory domain. Each chunk should be
small enough to fit all the data for a given computation into the cache,
thereby maximizing data reuse. In fact, one can treat loop blocking as
strip mining in two or more dimensions. Consider the code in
Example 3-18 and access pattern in Figure 3-3. The two-dimensional
array

A

is referenced in the

j

(column) direction and then referenced in

the

i

(row) direction (column-major order); whereas array

B

is

referenced in the opposite manner (row-major order). Assume the
memory layout is in column-major order; therefore, the access strides of
array

A

and

B

for the code in Example 3-20 would be 1 and

MAX

,

respectively.

Advertising