Floating point/simd operands, Floating point/simd operands -88 – Intel ARCHITECTURE IA-32 User Manual

Page 160

Advertising
background image

IA-32 Intel® Architecture Optimization

2-88

Using

test

instruction between the instruction that may modify part of

the flag register and the instruction that uses the flag register can also
help prevent partial flag register stall.

Assembly/Compiler Coding Rule 52. (ML impact, M generality) Use the

test

instruction instead of

and

when the result of the logical

and

is not used.

This saves uops in execution. Use a

test

if a register with itself instead of a

cmp

of the register to zero, this saves the need to encode the zero and saves

encoding space. Avoid comparing a constant to a memory operand. It is
preferable to load the memory operand and compare the constant to a register.

Often a produced value must be compared with zero, and then used in a
branch. Because most Intel architecture instructions set the condition
codes as part of their execution, the compare instruction may be
eliminated. Thus the operation can be tested directly by a

jcc

instruction. The notable exceptions are

mov

and

lea

. In these cases, use

test

.

Assembly/Compiler Coding Rule 53. (ML impact, M generality) Eliminate
unnecessary compare with zero instructions by using the appropriate
conditional jump instruction when the flags are already set by a preceding
arithmetic instruction. If necessary, use a

test

instruction instead of a

compare. Be certain that any code transformations made do not introduce
problems with overflow.

Floating Point/SIMD Operands

In initial Pentium 4 processor implementations, the latency of MMX or
SIMD floating point register to register moves is significant. This can
have implications for register allocation.

Moves that write a portion of a register can introduce unwanted
dependences. The

movsd reg, reg

instruction writes only the bottom

64 bits of a register, not to all 128 bits. This introduces a dependence on
the preceding instruction that produces the upper 64 bits (even if those
bits are not longer wanted). The dependence inhibits register renaming,
and thereby reduces parallelism.

Advertising