Sign extension to full 64-bits, Sign extension to full 64-bits -3 – Intel ARCHITECTURE IA-32 User Manual

Page 411

Advertising

64-bit Mode Coding Guidelines

8-3

If the compiler can determine at compile time that the result of a
multiply will not exceed 64 bits, then the compiler should generate the
multiply instruction that produces a 64-bit result. If the compiler or
assembly programmer can not determine that the result will be less than
64 bits, then a multiply that produces a 128-bit result is necessary.

Assembly/Compiler Coding rule

Prefer 64-bit by 64-bit integer multiplies that produce 64-bit results over

multiplies that produce 128-bit results.

Sign Extension to Full 64-Bits

When in 64-bit mode, the architecture is optimized to sign-extend to
64 bits in a single uop. In 64-bit mode, when the destination is 32 bits,
the upper 32 bits must be zeroed.

Zeroing the upper 32 bits requires an extra uop and is less optimal than
sign extending to 64 bits. While sign extending to 64 bits makes the
instruction one byte longer, it reduces the number of uops that the trace
cache has to store, improving performance.

For example, to sign-extend a byte into esi, use:

movsx rsi, BYTE PTR[rax]

instead of:

movsx esi, BYTE PTR[rax]

If the next instruction uses the 32-bit form of esi register, the result will
be the same. This optimization can also be used to break an unintended
dependency. For example, if a program writes a 16-bit value to a register
and then writes the register with an 8-bit value, if bits 15:8 of the
destination are not needed, use the sign-extended version of writes when
available.

For example:

mov r8w, r9w

;Requires a merge to preserve

;bits 63:15.

mov r8b, r10b

;Requires a merge to preserve bits 63:8

Advertising