Floating-point stalls, X87 floating-point comparison instructions, Transcendental functions – Intel ARCHITECTURE IA-32 User Manual

Page 144: Floating-point stalls -72

Advertising
background image

IA-32 Intel® Architecture Optimization

2-72

Floating-Point Stalls

Floating-point instructions have a latency of at least two cycles. But,
because of the out-of-order nature of Pentium II and the subsequent
processors, stalls will not necessarily occur on an instruction or µop
basis. However, if an instruction has a very long latency such as an

fdiv

, then scheduling can improve the throughput of the overall

application.

x87 Floating-point Operations with Integer Operands

For Pentium 4 processor, splitting floating-point operations (

fiadd

,

fisub

,

fimul

, and

fidiv

) that take 16-bit integer operands into two

instructions (

fild

and a floating-point operation) is more efficient.

However, for floating-point operations with 32-bit integer operands,
using

fiadd

,

fisub

,

fimul

, and

fidiv

is equally efficient compared

with using separate instructions.

Assembly/Compiler Coding Rule 36. (M impact, L generality) Try to use
32-bit operands rather than 16-bit operands for

fild.

However, do not do so

at the expense of introducing a store forwarding problem by writing the two
halves of the 32-bit memory operand separately.

x87 Floating-point Comparison Instructions

On Pentium II and the subsequent processors, the

fcomi

and

fcmov

instructions should be used when performing floating-point
comparisons. Using (

fcom

,

fcomp

,

fcompp

) instructions typically

requires additional instruction like

fstsw

. The latter alternative causes

more

μ

ops to be decoded, and should be avoided.

Transcendental Functions

If an application needs to emulate math functions in software due to
performance or other reasons (see the “Guidelines for Optimizing
Floating-point Code” sectio
n), it may be worthwhile to inline math
library calls because the

call

and the prologue/epilogue involved with

such calls can significantly affect the latency of operations.

Advertising