Data alignment for 128-bit data, Compiler-supported alignment, Data alignment for 128-bit data -24 – Intel ARCHITECTURE IA-32 User Manual

Page 204: Compiler-supported alignment -24

Advertising
background image

IA-32 Intel® Architecture Optimization

3-24

Another way to improve data alignment is to copy the data into
locations that are aligned on 64-bit boundaries. When the data is
accessed frequently, this can provide a significant performance
improvement.

Data Alignment for 128-bit data

Data must be 16-byte aligned when loading to or storing from the
128-bit XMM registers used by SSE and SSE2 to avoid severe
performance penalties at best, and at worst, execution faults. Although
there are move instructions (and intrinsics) to allow unaligned data to be
copied into and out of the XMM registers when not using aligned data,
such operations are much slower than aligned accesses. If, however, the
data is not 16-byte-aligned and the programmer or the compiler does not
detect this and uses the aligned instructions, a fault will occur. So, the
rule is: keep the data 16-byte-aligned. Such alignment will also work for
MMX technology code, even though MMX technology only requires
8-byte alignment. The following discussion and examples describe
alignment techniques for Pentium 4 processor as implemented with the
Intel C++ Compiler.

Compiler-Supported Alignment

The Intel C++ Compiler provides the following methods to ensure that
the data is aligned.

Alignment by

F32vec4

or

__m128

Data Types. When compiler detects

F32vec4

or

__m128

data declarations or parameters, it will force

alignment of the object to a 16-byte boundary for both global and local
data, as well as parameters. If the declaration is within a function, the
compiler will also align the function's stack frame to ensure that local
data and parameters are 16-byte-aligned. For details on the stack frame
layout that the compiler generates for both debug and optimized
(“release”-mode) compilations, please refer to the relevant Intel
application notes in the Intel Architecture Performance Training Center
provided with the SDK.

Advertising