Intel ARCHITECTURE IA-32 User Manual

Page 17

Advertising
background image

xvii

Example 4-20

Clipping to an Arbitrary Signed Range [high, low]......................... 4-27

Example 4-21

Simplified Clipping to an Arbitrary Signed Range ......................... 4-28

Example 4-22

Clipping to an Arbitrary Unsigned Range [high, low]..................... 4-29

Example 4-23

Complex Multiply by a Constant .................................................... 4-32

Example 4-24

A Large Load after a Series of Small Stores (Penalty).................. 4-35

Example 4-25

Accessing Data without Delay ....................................................... 4-35

Example 4-26

A Series of Small Loads after a Large Store ................................. 4-36

Example 4-27

Eliminating Delay for a Series of Small Loads after a
Large Store.................................................................................... 4-36

Example 4-28

An Example of Video Processing with Cache Line Splits.............. 4-37

Example 4-29

Video Processing Using LDDQU to Avoid Cache Line Splits ........ 4-38

Example 5-1

Pseudocode for Horizontal (xyz, AoS) Computation ....................... 5-8

Example 5-2

Pseudocode for Vertical (xxxx, yyyy, zzzz, SoA) Computation........ 5-9

Example 5-3

Swizzling Data............................................................................... 5-10

Example 5-4

Swizzling Data Using Intrinsics ..................................................... 5-12

Example 5-5

Deswizzling Single-Precision SIMD Data ...................................... 5-14

Example 5-6

Deswizzling Data Using the movlhps and shuffle
Instructions .................................................................................... 5-15

Example 5-7

Deswizzling Data 64-bit Integer SIMD Data .................................. 5-16

Example 5-8

Using MMX Technology Code for Copying or Shuffling................. 5-18

Example 5-9

Horizontal Add Using movhlps/movlhps ........................................ 5-19

Example 5-10

Horizontal Add Using Intrinsics with movhlps/movlhps ................. 5-21

Example 5-11

Multiplication of Two Pair of Single-precision Complex Number.... 5-24

Example 5-12

Division of Two Pair of Single-precision Complex Number............ 5-25

Example 5-13

Calculating Dot Products from AOS .............................................. 5-26

Example 6-1

Pseudo-code for Using cflush ....................................................... 6-18

Example 6-2

Populating an Array for Circular Pointer Chasing with
Constant Stride.............................................................................. 6-21

Example 6-3

Prefetch Scheduling Distance ....................................................... 6-26

Example 6-5

Concatenation and Unrolling the Last Iteration of Inner Loop ....... 6-28

Example 6-4

Using Prefetch Concatenation....................................................... 6-28

Example 6-6

Spread Prefetch Instructions ......................................................... 6-33

Example 6-7

Data Access of a 3D Geometry Engine without Strip-mining ........ 6-37

Example 6-8

Data Access of a 3D Geometry Engine with Strip-mining ............. 6-38

Example 6-9

Using HW Prefetch to Improve Read-Once Memory Traffic .......... 6-40

Example 6-10

Basic Algorithm of a Simple Memory Copy ................................... 6-46

Example 6-11

A Memory Copy Routine Using Software Prefetch........................ 6-48

Advertising