Table footnotes – Intel ARCHITECTURE IA-32 User Manual

Page 533

Advertising

IA-32 Instruction Latency and Throughput

C-19

Table Footnotes

The following footnotes refer to all tables in this appendix.

Latency information for many of instructions that are complex
(> 4

μops) are estimates based on conservative and worst-case

estimates. Actual performance of these instructions by the
out-of-order core execution unit can range from somewhat faster to
significantly faster than the nominal latency data shown in these
tables.

The names of execution units apply to processor implementations
of the Intel NetBurst microarchitecture only with CPUID signature
of family 15, model encoding = 0, 1, 2. They include:

ALU

FP_EXECUTE

FPMOVE

MEM_LOAD

MEM_STORE

. See Figure 1-4 for

execution units and ports in the out-of-order core. Note the
following:

•

The

FP_EXECUTE

unit is actually a cluster of execution units,

roughly consisting of seven separate execution units.

•

The

FP_ADD

unit handles x87 and SIMD floating-point add and

subtract operation.

•

The

FP_MUL

unit handles x87 and SIMD floating-point multiply

operation.

•

The

FP_DIV

unit handles x87 and SIMD floating-point divide

square-root operations.

•

The

MMX_SHFT

unit handles shift and rotate operations.

•

The

MMX_ALU

unit handles SIMD integer

ALU

operations.

•

The

MMX_MISC

unit handles reciprocal MMX computations and

some integer operations.

•

The

FP_MISC

designates other execution units in port 1 that are

separated from the six units listed above.

It may be possible to construct repetitive calls to some IA-32
instructions in code sequences to achieve latency that is one or two
clock cycles faster than the more realistic number listed in this
table.

Advertising