Latency and throughput with register operands – Intel ARCHITECTURE IA-32 User Manual

Page 520

Advertising
background image

IA-32 Intel® Architecture Optimization

C-6

Latency and Throughput with Register Operands

IA-32 instruction latency and throughput data are presented in
Table C-2 through Table C-8. The tables include the Streaming SIMD
Extension 3, Streaming SIMD Extension 2, Streaming SIMD Extension,
MMX technology and most of commonly used IA-32 instructions.
Instruction latency and throughput of the Pentium 4 processor and of the
Pentium M processor are given in separate columns. Pentium 4
processor instruction timing data is implementation specific, i.e. can
vary between model encoding value = 3 and model < 2. Separate data
sets of instruction latency and throughput are shown in the columns for
CPUID signature 0xF2n and 0xF3n. The notation 0xF2n represents the
hex value of the lower 12 bits of the EAX register reported by CPUID
instruction with input value of EAX = 1; ‘F’ indicates the family
encoding value is 15, ‘2’ indicates the model encoding is 2, ‘n’ indicates
it applies to any value in the stepping encoding. Pentium M processor
instruction timing data is shown in the columns represented by CPUID
signature 0x69n. The instruction timing for Pentium M processor with
CPUID signature 0x6Dn is the same as that of 0x69n.

Table C-1

Streaming SIMD Extension 3 SIMD Floating-point Instructions

Instruction

Latency

1

Throughput

Execution Unit

CPUID

0F3n

0F3n

0F3n

ADDSUBPD/ADDSUBPS 5

2

FP_ADD

HADDPD/HADDPS

13

4

FP_ADD,FP_MISC

HSUBPD/HSUBPS

13

4

FP_ADD,FP_MISC

MOVDDUP xmm1, xmm2

4

2

FP_MOVE

MOVSHDUP xmm1,
xmm2

6

2

FP_MOVE

MOVSLDUP xmm1,
xmm2

6

2

FP_MOVE

See “Table Footnotes”

Advertising