Memory accesses, Alignment, Memory accesses -29 – Intel ARCHITECTURE IA-32 User Manual

Page 101: Alignment -29

Advertising
background image

General Optimization Guidelines

2

2-29

Memory Accesses

This section discusses guidelines for optimizing code and data memory
accesses. The most important recommendations are:

align data, paying attention to data layout and stack alignment

enable store forwarding

place code and data on separate pages

enhance data locality

use prefetching and cacheability control instructions

enhance code locality and align branch targets

take advantage of write combining

Alignment and forwarding problems are among the most common
sources of large delays on the Pentium 4 processor.

Alignment

Alignment of data concerns all kinds of variables:

dynamically allocated

members of a data structure

global or local variables

parameters passed on the stack

Misaligned data access can incur significant performance penalties. This
is particularly true for cache line splits. The size of a cache line is
64 bytes in the Pentium 4, Intel Xeon, and Pentium M processors.

On the Pentium 4 processor, an access to data unaligned on 64-byte
boundary leads to two memory accesses and requires several µops to be
executed (instead of one). Accesses that span 64-byte boundaries are
likely to incur a large performance penalty, since they are executed near
retirement, and can incur stalls that are on the order of the depth of the
pipeline.

Advertising