C ia-32 instruction latency and throughput, Ia-32 instruction latency and throughput – Intel ARCHITECTURE IA-32 User Manual

Page 515

Advertising

C-1

IA-32 Instruction Latency and
Throughput

This appendix contains tables of the latency, throughput and execution
units that are associated with more-commonly-used IA-32 instructions

The instruction timing data varies within the IA-32 family of
processors. Only data specific to the Intel Pentium 4, Intel Xeon
processors and Intel Pentium M processor are provided. The relevance
of instruction throughput and latency information for code tuning is
discussed in Chapter 1 and Chapter 2, see “Execution Core Detail” in
Chapter 1 and “Floating Point/SIMD Operands” in Chapter 2.

This appendix contains the following sections:

•

“Overview”– an overview of issues related to instruction selection
and scheduling.

•

“Definitions” – the definitions for the primary information
presented in the tables in section “Latency and Throughput.”

•

“Latency and Throughput of Pentium 4 and Intel Xeon processors”
– the listings of IA-32 instruction throughput, latency and execution
units associated with commonly-used instruction.

Although instruction latency may be useful in some limited situations (e.g., a tight loop
with a dependency chain that exposes instruction latency), software optimization on
super-scalar, out-of-order microarchitecture, in general, will benefit much more on
increasing the effective throughput of the larger-scale code path. Coding techniques that
rely on instruction latency alone to influence the scheduling of instruction is likely to be
sub-optimal as such coding technique is likely to interfere with the out-of-order machine or
restrict the amount of instruction-level parallelism.

Advertising