Intel ARCHITECTURE IA-32 User Manual

Page 8

Advertising
background image

viii

Packed Shuffle Word for 64-bit Registers ...................................................................... 4-18
Packed Shuffle Word for 128-bit Registers .................................................................... 4-19
Unpacking/interleaving 64-bit Data in 128-bit Registers................................................. 4-20
Data Movement .............................................................................................................. 4-21
Conversion Instructions .................................................................................................. 4-21

Generating Constants ........................................................................................................... 4-21
Building Blocks...................................................................................................................... 4-23

Absolute Difference of Unsigned Numbers .................................................................... 4-23
Absolute Difference of Signed Numbers ........................................................................ 4-24
Absolute Value................................................................................................................ 4-25
Clipping to an Arbitrary Range [high, low] ...................................................................... 4-26

Highly Efficient Clipping ............................................................................................ 4-27
Clipping to an Arbitrary Unsigned Range [high, low] ................................................ 4-28

Packed Max/Min of Signed Word and Unsigned Byte.................................................... 4-29

Signed Word ............................................................................................................. 4-29
Unsigned Byte .......................................................................................................... 4-30

Packed Multiply High Unsigned...................................................................................... 4-30
Packed Sum of Absolute Differences ............................................................................. 4-30
Packed Average (Byte/Word) ......................................................................................... 4-31
Complex Multiply by a Constant ..................................................................................... 4-32
Packed 32*32 Multiply .................................................................................................... 4-33
Packed 64-bit Add/Subtract............................................................................................ 4-33
128-bit Shifts................................................................................................................... 4-33

Memory Optimizations .......................................................................................................... 4-34

Partial Memory Accesses ............................................................................................... 4-35

Supplemental Techniques for Avoiding Cache Line Splits ........................................ 4-37

Increasing Bandwidth of Memory Fills and Video Fills ................................................... 4-39

Increasing Memory Bandwidth Using the MOVDQ Instruction ................................. 4-39
Increasing Memory Bandwidth by Loading and Storing to and from the

Same DRAM Page ................................................................................................ 4-39

Increasing UC and WC Store Bandwidth by Using Aligned Stores........................... 4-40

Converting from 64-bit to 128-bit SIMD Integer .................................................................... 4-40

SIMD Optimizations and Microarchitectures .................................................................. 4-41

Packed SSE2 Integer versus MMX Instructions ....................................................... 4-42

Chapter 5

Optimizing for SIMD Floating-point Applications

General Rules for SIMD Floating-point Code.......................................................................... 5-1
Planning Considerations ......................................................................................................... 5-2
Using SIMD Floating-point with x87 Floating-point ................................................................. 5-3
Scalar Floating-point Code...................................................................................................... 5-3

Advertising