Figure 3-3, Loop blocking access pattern -36, Figure 3-3. the two-dimensional – Intel ARCHITECTURE IA-32 User Manual

Page 216

Advertising

IA-32 Intel® Architecture Optimization

3-36

This situation can be avoided if the loop is blocked with respect to the
cache size. In Figure 3-3, a

block_size

is selected as the loop blocking

factor. Suppose that

block_size

is 8, then the blocked chunk of each

array will be eight cache lines (32 bytes each). In the first iteration of the
inner loop,

A[0, 0:7]

and

B[0, 0:7]

will be brought into the cache.

B[0, 0:7]

will be completely consumed by the first iteration of the

outer loop. Consequently,

B[0, 0:7]

will only experience one cache

miss after applying loop blocking optimization in lieu of eight misses
for the original algorithm. As illustrated in Figure 3-3, arrays

and

are

blocked into smaller rectangular chunks so that the total size of two
blocked

and

chunks is smaller than the cache size. This allows

maximum data reuse.

Figure 3-3

Loop Blocking Access Pattern

O M15158

A (i, j) access pattern

A(i, j) access pattern

after blocking

B(i, j) access pattern

after blocking

< cache size

Blocking

Advertising