Optimize instruction scheduling, Enable vectorization – Intel ARCHITECTURE IA-32 User Manual

Page 79

Advertising

General Optimization Guidelines

2-7

•

Avoid longer latency instructions: integer multiplies and divides.
Replace them with alternate code sequences (e.g., use shifts instead
of multiplies).

•

Use the

lea

instruction and the full range of addressing modes to do

address calculation.

•

Some types of stores use more µops than others, try to use simpler
store variants and/or reduce the number of stores.

•

Avoid use of complex instructions that require more than 4 µops.

•

Avoid instructions that unnecessarily introduce dependence-related
stalls:

inc

and

dec

instructions, partial register operations (8/16-bit

operands).

•

Avoid use of

, and other higher 8-bits of the 16-bit registers,

because accessing them requires a shift operation internally.

•

Use

xor

and

pxor

instructions to clear registers and break

dependencies for integer operations; also use

xorps

and

xorpd

clear XMM registers for floating-point operations.

•

Use efficient approaches for performing comparisons.

Optimize Instruction Scheduling

•

Consider latencies and resource constraints.

•

Calculate store addresses as early as possible.

Enable Vectorization

•

Use the smallest possible data type. This enables more parallelism
with the use of a longer vector.

•

Arrange the nesting of loops so the innermost nesting level is free of
inter-iteration dependencies. It is especially important to avoid the
case where the store of data in an earlier iteration happens lexically
after the load of that data in a future iteration (called
lexically-backward dependence).

Advertising