Example 5-6 – Intel ARCHITECTURE IA-32 User Manual

Page 277

Advertising
background image

Optimizing for SIMD Floating-point Applications

5

5-15

You may have to swizzle data in the registers, but not in memory. This
occurs when two different functions need to process the data in different
layout. In lighting, for example, data comes as

rrrr

gggg

bbbb

aaaa

,

and you must deswizzle them into

rgba

before converting into integers.

In this case you use the

movlhps

/

movhlps

instructions to do the first

part of the deswizzle followed by

shuffle

instructions, see

Example 5-6 and Example 5-7.

unpcklps xmm5, xmm4

// xmm5= z1 w1 z2 w2

unpckhps xmm0, xmm4

// xmm0= z3 w3 z4 w4

movlps [edx+8], xmm5

// v1 = x1 y1 z1 w1

movhps [edx+24], xmm5

// v2 = x2 y2 z2 w2

movlps [edx+40], xmm0

// v3 = x3 y3 z3 w3

movhps [edx+56], xmm0

// v4 = x4 y4 z4 w4

// DESWIZZLING ENDS HERE

}

}

Example 5-6

Deswizzling Data Using the movlhps and shuffle
Instructions

void deswizzle_rgb(Vertex_soa *in, Vertex_aos *out)

{

//---deswizzle rgb---

// assume: xmm1=rrrr, xmm2=gggg, xmm3=bbbb, xmm4=aaaa

__asm {

mov ecx, in

// load structure addresses

mov edx, out

movaps xmm1, [ecx]

// load r1 r2 r3 r4 => xmm1

movaps xmm2, [ecx+16]

// load g1 g2 g3 g4 => xmm2

movaps xmm3, [ecx+32]

// load b1 b2 b3 b4 => xmm3

movaps xmm4, [ecx+48]

// load a1 a2 a3 a4 => xmm4

continued

Example 5-5

Deswizzling Single-Precision SIMD Data (continued)

Advertising