Integer divide, Operand sizes and partial register accesses – Intel ARCHITECTURE IA-32 User Manual

Page 148

Advertising
background image

IA-32 Intel® Architecture Optimization

2-76

CMPXCHG8B, various rotate instructions, STC, and STD. An example
of assembly with a partial flag register stall and alternative code without
the stall is shown in Table 2-2.

Integer Divide

Typically, an integer divide is preceded by a

cwd

or

cdq

instruction.

Depending on the operand size, divide instructions use

DX:AX

or

EDX:EAX

for the dividend. The

cwd

or

cdq

instructions sign-extend

AX

or

EAX

into

DX

or

EDX

, respectively. These instructions are denser encoding

than a

shift

and

move

would be, but they generate the same number of

μ

ops. If

AX

or

EAX

are known to be positive, replace these instructions

with

xor

dx, dx

or

xor

edx, edx

Operand Sizes and Partial Register Accesses

The Pentium 4 processor, Pentium M processor (with CPUID signature
family 6, model 13), Intel Core Solo and Intel Core Duo processors do
not incur a penalty for partial register accesses; Pentium M processor

Table 2-2

Avoiding Partial Flag Register Stall

A Sequence with

Partial Flag Register Stall

Alternate Sequence without

Partial Flag Register Stall

xor eax,eax

mov ecx,a

sar ecx,2

setz al ; no partial register

stall,

; flag stall as sar may change

; the flags

xor eax,eax

mov ecx,a

sar ecx,2

test ecx,ecx

setz al ; no partial reg or flag

; stall, test

; always updates all the flags

Advertising