Intel ARCHITECTURE IA-32 User Manual

Page 178

Advertising
background image

IA-32 Intel® Architecture Optimization

2-106

instead of a

cmp

of the register to zero, this saves the need to encode the

zero and saves encoding space. Avoid comparing a constant to a memory
operand. It is preferable to load the memory operand and compare the
constant to a register. 2-79

Assembly/Compiler Coding Rule 51. (ML impact, M generality)
Eliminate unnecessary compare with zero instructions by using the
appropriate conditional jump instruction when the flags are already set by
a preceding arithmetic instruction. If necessary, use a

test

instruction

instead of a compare. Be certain that any code transformations made do
not introduce problems with overflow. 2-79

Assembly/Compiler Coding Rule 52. (M impact, ML generality)
Avoid introducing dependences with partial floating point register writes,
e.g. from the

movsd xmmreg1, xmmreg2

instruction. Use the

movapd

xmmreg1, xmmreg2

instruction instead. 2-80

Assembly/Compiler Coding Rule 53. (ML impact, L generality)
Instead of using

movupd xmmreg1, mem

for a unaligned 128-bit load,

use

movsd xmmreg1, mem; movsd xmmreg2, mem+8;

unpcklpd

xmmreg1, xmmreg2

. If the additional register is not available, then use

movsd xmmreg1, mem; movhpd xmmreg1, mem+8.

2-80

Assembly/Compiler Coding Rule 54. (M impact, ML generality)
Instead of using

movupd mem, xmmreg1

for a store, use

movsd mem,

xmmreg1; unpckhpd xmmreg1, xmmreg1; movsd mem+8,

xmmreg1

instead. 2-80

Assembly/Compiler Coding Rule 55. (M impact, MH generality) In
routines that do not need a frame pointer and that do not have called
routines that modify

ESP

, use

ESP

as the base register to free up

EBP

. This

optimization does not apply in the following cases: a routine is called that
leaves

ESP

modified upon return, for example,

alloca

; routines that rely

on

EBP

for structured or C++ style exception handling; routines that use

setjmp

and

longjmp

; routines that use EBP to align the local stack on

an 8- or 16-byte boundary; and routines that rely on

EBP

debugging. 2-81

Advertising