Intel ARCHITECTURE IA-32 User Manual

Page 279

Advertising
background image

Optimizing for SIMD Floating-point Applications

5

5-17

Using MMX Technology Code for Copy or Shuffling
Functions

If there are some parts in the code that are mainly copying, shuffling, or
doing logical manipulations that do not require use of SSE code,
consider performing these actions with MMX technology code. For
example, if texture data is stored in memory as SoA (

uuuu

,

vvvv

) and

they need only to be deswizzled into AoS layout (

uv

) for the graphic

cards to process, you can use either the SSE or MMX technology code.
Using the MMX instructions allow you to conserve XMM registers for
other computational tasks.

movq mm1, [ebx+16]

// mm1= v1 v2

movq mm2, mm0

// mm2= u1 u2

punpckhdq mm0, mm1

// mm0= u1 v1

punpckldq mm2, mm1

// mm0= u2 v2

movq [edx], mm2

// store u1 v1

movq [edx+8], mm0

// store u2 v2

movq mm4, [ebx+8]

// mm0= u3 u4

movq mm5, [ebx+24]

// mm1= v3 v4

movq mm6, mm4

// mm2= u3 u4

punpckhdq mm4, mm5

// mm0= u3 v3

punpckldq mm6, mm5

// mm0= u4 v4

movq [edx+16], mm6

// store u3v3

movq [edx+24], mm4

// store u4v4

}

Example 5-7

Deswizzling Data 64-bit Integer SIMD Data (continued)

Advertising