Intel ARCHITECTURE IA-32 User Manual

Page 293

Advertising
background image

Optimizing Cache Usage

6

6-3

Facilitate compiler optimization:

— Minimize use of global variables and pointers

— Minimize use of complex control flow

— Use the

const

modifier, avoid

register

modifier

— Choose data types carefully (see below) and avoid type casting.

Use cache blocking techniques (for example, strip mining):

— Improve cache hit rate by using cache blocking techniques such

as strip-mining (one dimensional arrays) or loop blocking (two
dimensional arrays)

— Explore using hardware prefetching mechanism if your data

access pattern has sufficient regularity to allow alternate
sequencing of data accesses (e.g., tiling) for improved spatial
locality; otherwise use

prefetchnta

.

Balance single-pass versus multi-pass execution:

— An algorithm can use single- or multi-pass execution defined as

follows: single-pass, or unlayered execution passes a single data
element through an entire computation pipeline. Multi-pass, or
layered execution performs a single stage of the pipeline on a
batch of data elements before passing the entire batch on to the
next stage.

— General guideline to minimize pollution: if your algorithm is

single-pass use

prefetchnta

; if your algorithm is multi-pass

use

prefetcht0

.

Resolve memory bank conflict issues:

— Minimize memory bank conflicts by applying array grouping to

group contiguously used data together or allocating data within
4 KB memory pages.

Resolve cache management issues:

— Minimize disturbance of temporal data held within the

processor’s caches by using streaming store instructions, as
appropriate.

Advertising