Spin-wait and idle loops, Spin-wait and idle loops -18, Example 2-3 – Intel ARCHITECTURE IA-32 User Manual

Page 90: Eliminating branch with cmov instruction -18

Advertising
background image

IA-32 Intel® Architecture Optimization

2-18

The

cmov

and

fcmov

instructions are available on the Pentium II and

subsequent processors, but not on Pentium processors and earlier 32-bit
Intel architecture processors. Be sure to check whether a processor
supports these instructions with the

cpuid

instruction.

Spin-Wait and Idle Loops

The Pentium 4 processor introduces a new

pause

instruction; the

instruction is architecturally a

nop

on all IA-32 implementations. To the

Pentium 4 processor, this instruction acts as a hint that the code
sequence is a spin-wait loop. Without a

pause

instruction in such loops,

the Pentium 4 processor may suffer a severe penalty when exiting the
loop because the processor may detect a possible memory order
violation. Inserting the

pause

instruction significantly reduces the

likelihood of a memory order violation and as a result improves
performance.

In Example 2-4, the code spins until memory location A matches the
value stored in the register

eax

. Such code sequences are common when

protecting a critical section, in producer-consumer sequences, for
barriers, or other synchronization.

Example 2-3

Eliminating Branch with CMOV Instruction

test ecx, ecx

jne 1h

mov eax, ebx

1h:

; To optimize code, combine jne and mov into one cmovcc

; instruction that checks the equal flag

test ecx, ecx

; test the flags

cmoveq eax, ebx

; if the equal flag is set, move

; ebx to eax - the lh: tag no longer

;

needed

Advertising