Intel ARCHITECTURE IA-32 User Manual

Page 136

Advertising
background image

IA-32 Intel® Architecture Optimization

2-64

Assembly/Compiler Coding Rule 31. (H impact, M generality) Minimize
changes to bits 8-12 of the floating point control word. Changes for more than
two values (each value being a combination of the following bits: precision,
rounding and infinity control, and the rest of bits in FCW) leads to delays that
are on the order of the pipeline depth.

Rounding Mode

Many libraries provide the float-to-integer library routines that convert
floating-point values to integer. Many of these libraries conform to
ANSI C coding standards which state that the rounding mode should be
truncation. With the Pentium 4 processor, one can use the

cvttsd2si

and

cvttss2si

instructions to convert operands with truncation and

without ever needing to change rounding modes. The cost savings of
using these instructions over the methods below is enough to justify
using Streaming SIMD Extensions and Streaming SIMD Extensions 2
wherever possible when truncation is involved.

For x87 floating point, the

fist

instruction uses the rounding mode

represented in the floating-point control word (FCW). The rounding
mode is generally round to nearest, therefore many compiler writers
implement a change in the rounding mode in the processor in order to
conform to the C and FORTRAN standards. This implementation
requires changing the control word on the processor using the

fldcw

instruction. For a change in the rounding, precision, and infinity bits;
use the

fstcw

instruction to store the floating-point control word. Then

use the

fldcw

instruction to change the rounding mode to truncation.

In a typical code sequence that changes the rounding mode in the FCW,
a

fstcw

instruction is usually followed by a load operation. The load

operation from memory should be a 16-bit operand to prevent store-
forwarding problem. If the load operation on the previously-stored
FCW word involves either an 8-bit or a 32-bit operand, this will cause a
store-forwarding problem due to mismatch of the size of the data
between the store operation and the load operation.

Make sure that the write and read to the FCW are both 16-bit operations,
to avoid store-forwarding problems.

Advertising