Intel ARCHITECTURE IA-32 User Manual

Page 7

Advertising
background image

vii

Considerations for Code Conversion to SIMD Programming.................................................. 3-8

Identifying Hot Spots ...................................................................................................... 3-10
Determine If Code Benefits by Conversion to SIMD Execution...................................... 3-11

Coding Techniques ............................................................................................................... 3-12

Coding Methodologies.................................................................................................... 3-13

Assembly .................................................................................................................. 3-15
Intrinsics.................................................................................................................... 3-15
Classes ..................................................................................................................... 3-17
Automatic Vectorization ............................................................................................ 3-18

Stack and Data Alignment..................................................................................................... 3-20

Alignment and Contiguity of Data Access Patterns ........................................................ 3-20

Using Padding to Align Data..................................................................................... 3-20
Using Arrays to Make Data Contiguous.................................................................... 3-21

Stack Alignment For 128-bit SIMD Technologies ........................................................... 3-22
Data Alignment for MMX Technology ............................................................................. 3-23
Data Alignment for 128-bit data...................................................................................... 3-24

Compiler-Supported Alignment................................................................................. 3-24

Improving Memory Utilization................................................................................................ 3-27

Data Structure Layout..................................................................................................... 3-27
Strip Mining..................................................................................................................... 3-32
Loop Blocking ................................................................................................................. 3-34

Instruction Selection.............................................................................................................. 3-37

SIMD Optimizations and Microarchitectures .................................................................. 3-38

Tuning the Final Application .................................................................................................. 3-39

Chapter 4

Optimizing for SIMD Integer Applications

General Rules on SIMD Integer Code .................................................................................... 4-2
Using SIMD Integer with x87 Floating-point............................................................................ 4-3

Using the EMMS Instruction ............................................................................................. 4-3
Guidelines for Using EMMS Instruction............................................................................ 4-4

Data Alignment........................................................................................................................ 4-6
Data Movement Coding Techniques ....................................................................................... 4-6

Unsigned Unpack ............................................................................................................. 4-6
Signed Unpack ................................................................................................................. 4-7
Interleaved Pack with Saturation ...................................................................................... 4-8
Interleaved Pack without Saturation ............................................................................... 4-10
Non-Interleaved Unpack................................................................................................. 4-11
Extract Word................................................................................................................... 4-13
Insert Word ..................................................................................................................... 4-14
Move Byte Mask to Integer............................................................................................. 4-16

Advertising