Intel ARCHITECTURE IA-32 User Manual

Page 535

Text mode
Original mode

Advertising

IA-32 Instruction Latency and Throughput

C-21

For the sake of simplicity, all data being requested is assumed to reside
in the first level data cache (cache hit). In general, IA-32 instructions
with load operations that execute in the integer ALU units require two
more clock cycles than the corresponding register-to-register flavor of
the same instruction. Throughput of these instructions with load
operation remains the same with the register-to-register flavor of the
instructions.

Floating-point, MMX technology, Streaming SIMD Extensions and
Streaming SIMD Extension 2 instructions with load operations require 6
more clocks in latency than the register-only version of the instructions,
but throughput remains the same.

When store operations are on the critical path, their results can generally
be forwarded to a dependent load in as few as zero cycles. Thus, the
latency to complete and store isn’t relevant here.

Advertising