Example 5-3, Swizzling data -10 – Intel ARCHITECTURE IA-32 User Manual

Page 272

Advertising
background image

IA-32 Intel® Architecture Optimization

5-10

To gather data from 4 different memory locations on the fly, follow
steps:

1.

Identify the first half of the 128-bit memory location.

2.

Group the different halves together using the

movlps

and

movhps

to

form an

xyxy

layout in two registers.

3.

From the 4 attached halves, get the

xxxx

by using one shuffle, the

yyyy

by using another shuffle.

The

zzzz

is derived the same way but only requires one shuffle.

Example 5-3 illustrates the swizzle function.

Example 5-3

Swizzling Data

typedef struct _VERTEX_AOS {

float x, y, z, color;

} Vertex_aos;

// AoS structure declaration

typedef struct _VERTEX_SOA {

float x[4], float y[4], float z[4];

float color[4];

} Vertex_soa;

// SoA structure declaration

void swizzle_asm (Vertex_aos *in, Vertex_soa *out)

{

// in mem: x1y1z1w1-x2y2z2w2-x3y3z3w3-x4y4z4w4-

// SWIZZLE XYZW --> XXXX

asm {

mov ecx, in

// get structure addresses

mov edx, out

continued

Advertising