Instruction scheduling, Latencies and resource constraints, Instruction scheduling -91 – Intel ARCHITECTURE IA-32 User Manual

Page 163: Latencies and resource constraints -91, Example 2-25, Recombining load/op code into reg,mem form -91

Advertising
background image

General Optimization Guidelines

2

2-91

Using memory as a destination operand may further reduce register
pressure at the slight risk of making trace cache packing more difficult.

On the Pentium 4 processor, the sequence of loading a value from
memory into a register and adding the results in a register to memory is
faster than the alternate sequence of adding a value from memory to a
register and storing the results in a register to memory. The first
sequence also uses one less

μop than the latter.

Assembly/Compiler Coding Rule 59. (ML impact, M generality) Give
preference to adding a register to memory (memory is the destination) instead
of adding memory to a register. Also, give preference to adding a register to
memory over loading the memory, adding two registers and storing the result.

Assembly/Compiler Coding Rule 60. (M impact, M generality) When an
address of a store is unknown, subsequent loads cannot be scheduled to
execute out of order ahead of the store, limiting the out of order execution of
the processor. When an address of a store is computed by a potentially long
latency operation (such as a load that might miss the data cache) attempt to
reorder subsequent loads ahead of the store.

Instruction Scheduling

Ideally, scheduling or pipelining should be done in a way that optimizes
performance across all processor generations. This section presents
scheduling rules that can improve the performance of your code on the
Pentium 4 processor.

Latencies and Resource Constraints

Assembly/Compiler Coding Rule 61. (M impact, MH generality) Calculate
store addresses as early as possible to avoid having stores block loads.

Example 2-25 Recombining LOAD/OP Code into REG,MEM Form

LOAD reg1, mem1

... code that does not write to reg1...

OP reg2,

reg1

... code that does not use reg1 ...

Advertising