Intel ARCHITECTURE IA-32 User Manual

Page 290

Advertising
background image

IA-32 Intel® Architecture Optimization

5-28

When targeting complex arithmetics on Intel Core Solo and Intel Core
Duo processors, using single-precision SSE3 instructions can deliver
higher performance than alternatives. On the other hand, tasks requiring
double-precision complex arithmetics may perform better using scalar
SSE2 instructions on Intel Core Solo and Intel Core Duo processors.
This is because scalar SSE2 instructions can be dispatched through two
ports and executed using two separate floating-point units.

Packed horizontal SSE3 instructions (haddps and hsubps) can simplify
the code sequence for some tasks. However, these instruction consist of
more than five micro-ops on Intel Core Solo and Intel Core Duo
processors. Care must be taken to ensure the latency and decoding
penalty of the horizontal instruction does not offset any algorithmic
benefits.

Advertising