Flush-to-zero and denormals-are-zero modes, Simd floating-point programming using sse3 – Intel ARCHITECTURE IA-32 User Manual

Page 284

Advertising
background image

IA-32 Intel® Architecture Optimization

5-22

avoided since there is a penalty associated with writing this register;
typically, through the use of the

cvttps2pi

and

cvttss2si

instructions,

the rounding control in

MXCSR

can be always be set to round-nearest.

Flush-to-Zero and Denormals-are-Zero Modes

The flush-to-zero (FTZ) and denormals-are-zero (DAZ) mode are not
compatible with IEEE Standard 754. They are provided to improve
performance for applications where underflow is common and where
the generation of a denormalized result is not necessary. See
“Floating-point Modes and Exceptions” in Chapter 2.

SIMD Floating-point Programming Using SSE3

SSE3 enhances SSE and SSE2 with 9 instructions targeted for SIMD
floating-point programming. In contrast to many SSE and SSE2
instructions offering homogeneous arithmetic operations on parallel
data elements (see Figure 5-1) and favoring the vertical computation
model, SSE3 offers instructions that performs asymmetric arithmetic
operation and arithmetic operation on horizontal data elements.
ADDSUBPS and ADDSUBPD are two instructions with asymmetric
arithmetic processing capability (see Figure 5-4). HADDPS, HADDPD,
HSUBPS and HSUBPD offers horizontal arithmetic processing
capability (see Figure 5-5). In addition, MOVSLDUP, MOVSHDUP
and MOVDDUP can load data from memory (or XMM register) and
replicate data elements at once.

Advertising