Simd optimizations and microarchitectures, Simd optimizations and microarchitectures -38 – Intel ARCHITECTURE IA-32 User Manual

Page 218

Advertising
background image

IA-32 Intel® Architecture Optimization

3-38

Note that this can be applied to both SIMD integer and SIMD
floating-point code.

If there are multiple consumers of an instance of a register, group the
consumers together as closely as possible. However, the consumers
should not be scheduled near the producer.

SIMD Optimizations and Microarchitectures

Pentium M, Intel Core Solo and Intel Core Duo processors have a
different microarchitecture than Intel NetBurst

®

microarchitecture. The

following sub-section discusses optimizing SIMD code targeting Intel
Core Solo and Intel Core Duo processors.

The register-register variant of the following instructions has improved
performance on Intel Core Solo and Intel Core Duo processor relative to
Pentium M processors. This is because the instructions consist of two
micro-ops instead of three. Relevant instructions are: unpcklps,
unpckhps, packsswb, packuswb, packssdw, pshufd, shuffps and shuffpd.

top_of_loop:

movq

mm0, [A + eax]

pcmpgtw mm0, [B + eax]; Create compare mask

movq

mm1, [D + eax]

pand

mm1, mm0; Drop elements where A<B

pandn

mm0, [E + eax] ; Drop elements where A>B

por

mm0, mm1; Crete single word

movq

[C + eax], mm0

add

eax, 8

cmp eax,

MAX_ELEMENT*2

jle

top_of_loop

Example 3-21 Emulation of Conditional Moves (continued)

Advertising