Overview – Intel ARCHITECTURE IA-32 User Manual

Page 516

Advertising

IA-32 Intel® Architecture Optimization

C-2

Overview

The current generation of IA-32 family of processors use out-of-order
execution with dynamic scheduling and buffering to tolerate poor
instruction selection and scheduling that may occur in legacy code. It
can reorder

μops to cover latency delays and to avoid resource conflicts.

In some cases, the microarchitecture’s ability to avoid such delays can
be enhanced by arranging IA-32 instructions. While reordering IA-32
instructions may help, the execution core determines the final schedule
of

μops.

This appendix provides information to assembly language programmers
and compiler writers, to aid in selecting the sequence of instructions
which minimizes dependency chain latency, and to arrange instructions
in an order which assists the hardware in processing instructions
efficiently while avoiding resource conflicts. The performance impact
of applying the information presented in this appendix has been shown
to be on the order of several percent, for applications which are not
completely dominated by other performance factors, such as:

•

cache miss latencies

•

bus bandwidth

•

I/O bandwidth

Instruction selection and scheduling matters when the compiler or
assembly programmer has already addressed the performance issues
discussed in Chapter 2:

•

observe store forwarding restrictions

•

avoid cache line and memory order buffer splits

•

do not inhibit branch prediction

•

minimize the use of

xchg

instructions on memory locations

Advertising