Intel ARCHITECTURE IA-32 User Manual

Page 150

Advertising
background image

IA-32 Intel® Architecture Optimization

2-78

Table 2-3 illustrates using

movzx

to avoid a partial register stall when

packing three byte values into a register.

Assembly/Compiler Coding Rule 44. (ML impact, L generality) Use simple
instructions that are less than eight bytes in length.

Assembly/Compiler Coding Rule 45. (M impact, MH generality) Avoid
using prefixes to change the size of immediate and displacement.

Long instructions (more than seven bytes) limit the number of decoded
instructions per cycle on the Pentium M processor. Each prefix adds one
byte to the length of instruction, possibly limiting the decoder’s
throughput. In addition, multiple prefixes can only be decoded by the
first decoder. These prefixes also incur a delay when decoded. If
multiple prefixes or a prefix that changes the size of an immediate or
displacement cannot be avoided, schedule them behind instructions that
stall the pipe for some other reason.

Assembly/Compiler Coding Rule 46. (M impact, MH generality) Break
dependences on portions of registers between instructions by operating on
32-bit registers instead of partial registers. For moves, this can be
accomplished with 32-bit moves or by using

movzx

.

On Pentium M processors, the

movsx

and

movzx

instructions both take a

single

μop, whether they move from a register or memory. On Pentium

4 processors, the

movsx

takes an additional

μop. This is likely to cause

Table 2-3

Avoiding Partial Register Stall When Packing Byte Values

A Sequence with Partial Register Stall

Alternate Sequence without
Partial Register Stall

mov al,byte ptr a[2]

shl eax,16

mov ax,word ptr a

movd mm0,eax

movzx eax,byte ptr a[2]

shl eax,16

movzx ecx,word ptr a

or eax,ecx

movd mm0,eax

Advertising