Data deswizzling, Data deswizzling -14, Example 5-5 – Intel ARCHITECTURE IA-32 User Manual

Page 276: Deswizzling single-precision simd data -14

Advertising
background image

IA-32 Intel® Architecture Optimization

5-14

Data Deswizzling

In the deswizzle operation, we want to arrange the SoA format back into
AoS format so the

xxxx

,

yyyy

,

zzzz

are rearranged and stored in

memory as

xyz

. To do this we can use the

unpcklps

/

unpckhps

instructions to regenerate the

xyxy

layout and then store each half (

xy

)

into its corresponding memory location using

movlps

/

movhps

followed

by another

movlps

/

movhps

to store the

z

component.

Example 5-5 illustrates the deswizzle function:

Example 5-5

Deswizzling Single-Precision SIMD Data

void deswizzle_asm(Vertex_soa *in, Vertex_aos *out)

{

__asm {

mov ecx, in

// load structure addresses

mov edx, out

movaps xmm7, [ecx]

// load x1 x2 x3 x4 => xmm7

movaps xmm6, [ecx+16]

// load y1 y2 y3 y4 => xmm6

movaps xmm5, [ecx+32]

// load z1 z2 z3 z4 => xmm5

movaps xmm4, [ecx+48]

// load w1 w2 w3 w4 => xmm4

// START THE DESWIZZLING HERE

movaps xmm0, xmm7

// xmm0= x1 x2 x3 x4

unpcklps xmm7, xmm6

// xmm7= x1 y1 x2 y2

movlps [edx], xmm7

// v1 = x1 y1 -- --

movhps [edx+16], xmm7

// v2 = x2 y2 -- --

unpckhps xmm0, xmm6

// xmm0= x3 y3 x4 y4

movlps [edx+32], xmm0

// v3 = x3 y3 -- --

movhps [edx+48], xmm0

// v4 = x4 y4 -- --

movaps xmm0, xmm5

// xmm0= z1 z2 z3 z4

continued

Advertising