Intel ARCHITECTURE IA-32 User Manual

Page 176

Advertising
background image

IA-32 Intel® Architecture Optimization

2-104

Assembly/Compiler Coding Rule 32. (H impact, L generality)
Minimize the number of changes to the rounding mode. Do not use
changes in the rounding mode to implement the floor and ceiling functions
if this involves a total of more than two values of the set of rounding,
precision and infinity bits. 2-67

Assembly/Compiler Coding Rule 33. (H impact, L generality)
Minimize the number of changes to the precision mode. 2-68

Assembly/Compiler Coding Rule 34. (M impact, M generality) Use

fxch

only where necessary to increase the effective name space. 2-68

Assembly/Compiler Coding Rule 35. (M impact, M generality) Use
Streaming SIMD Extensions 2 or Streaming SIMD Extensions unless you
need an x87 feature. Most SSE2 arithmetic operations have shorter
latency then their X87 counterparts and they eliminate the overhead
associated with the management of the X87 register stack. 2-70

Assembly/Compiler Coding Rule 36. (M impact, L generality) Try to
use 32-bit operands rather than 16-bit operands for

fild.

However, do

not do so at the expense of introducing a store forwarding problem by
writing the two halves of the 32-bit memory operand separately. 2-71

Assembly/Compiler Coding Rule 37. (M impact, H generality)
Choose instructions with shorter latencies and fewer micro-ops. Favor
single micro-operation instructions. 2-72

Assembly/Compiler Coding Rule 38. (M impact, L generality) Avoid
prefixes, especially multiple non-0F-prefixed opcodes. 2-72

Assembly/Compiler Coding Rule 39. (M impact, L generality) Do not
use many segment registers. 2-72

Assembly/Compiler Coding Rule 40. (ML impact, M generality)
Avoid using complex instructions (for example,

enter

,

leave

, or

loop

)

that generally have more than four µops and require multiple cycles to
decode. Use sequences of simple instructions instead. 2-72

Assembly/Compiler Coding Rule 41. (ML impact, M generality) If a

lea

instruction using the scaled index is on the critical path, a sequence

with

add

s may be better. If code density and bandwidth out of the trace

cache are the critical factor, then use the

lea

instruction. 2-73

Advertising