Intel ARCHITECTURE IA-32 User Manual

Page 9

Advertising
background image

ix

Data Alignment........................................................................................................................ 5-4

Data Arrangement ............................................................................................................ 5-4

Vertical versus Horizontal Computation...................................................................... 5-5
Data Swizzling ............................................................................................................ 5-9
Data Deswizzling ...................................................................................................... 5-14
Using MMX Technology Code for Copy or Shuffling Functions ................................ 5-17
Horizontal ADD Using SSE....................................................................................... 5-18

Use of cvttps2pi/cvttss2si Instructions .................................................................................. 5-21
Flush-to-Zero and Denormals-are-Zero Modes .................................................................... 5-22
SIMD Floating-point Programming Using SSE3 ................................................................... 5-22

SSE3 and Complex Arithmetics ..................................................................................... 5-23
SSE3 and Horizontal Computation................................................................................. 5-26
SIMD Optimizations and Microarchitectures .................................................................. 5-27

Packed Floating-Point Performance ......................................................................... 5-27

Chapter 6

Optimizing Cache Usage

General Prefetch Coding Guidelines....................................................................................... 6-2
Hardware Prefetching of Data................................................................................................. 6-4
Prefetch and Cacheability Instructions.................................................................................... 6-5
Prefetch................................................................................................................................... 6-6

Software Data Prefetch .................................................................................................... 6-6
The Prefetch Instructions – Pentium 4 Processor Implementation................................... 6-8
Prefetch and Load Instructions......................................................................................... 6-8

Cacheability Control ................................................................................................................ 6-9

The Non-temporal Store Instructions.............................................................................. 6-10

Fencing ..................................................................................................................... 6-10
Streaming Non-temporal Stores ............................................................................... 6-10
Memory Type and Non-temporal Stores ................................................................... 6-11
Write-Combining ....................................................................................................... 6-12

Streaming Store Usage Models...................................................................................... 6-13

Coherent Requests................................................................................................... 6-13
Non-coherent requests ............................................................................................. 6-13

Streaming Store Instruction Descriptions ....................................................................... 6-14
The fence Instructions .................................................................................................... 6-15

The sfence Instruction .............................................................................................. 6-15
The lfence Instruction ............................................................................................... 6-16
The mfence Instruction ............................................................................................. 6-16

The clflush Instruction .................................................................................................... 6-17

Memory Optimization Using Prefetch.................................................................................... 6-18

Software-controlled Prefetch .......................................................................................... 6-18

Advertising