Example 5-9, Horizontal add using movhlps/movlhps -19, Figure 5-3 – Intel ARCHITECTURE IA-32 User Manual

Page 281: Figure 5-3 schematically p, Lhps, while example 5-9

Advertising
background image

Optimizing for SIMD Floating-point Applications

5

5-19

Figure 5-3

Horizontal Add Using movhlps/movlhps

Example 5-9

Horizontal Add Using movhlps/movlhps

void horiz_add(Vertex_soa *in, float *out) {

__asm {

mov ecx, in

// load structure addresses

mov edx, out

movaps xmm0, [ecx]

// load A1 A2 A3 A4 => xmm0

movaps xmm1, [ecx+16]

// load B1 B2 B3 B4 => xmm1

movaps xmm2, [ecx+32]

// load C1 C2 C3 C4 => xmm2

movaps xmm3, [ecx+48]

// load D1 D2 D3 D4 => xmm3

continued

A1+A2+A3+A4

B1+B2+B3+B4

C1+C2+C3+C4

D1+D2+D3+D4

A1+A3

B1+B3

C1+C3

D1+D3

A2+A4

B2+B4

C2+C4

D2+D4

A1+A3

A2+A4

B1+B3

B2+B4

C1+C3

C2+C4

D1+D3

D2+D4

A1

A2

A3

A4

B1

B2

B3

B4

C1

C2

C3

C4

D1

D2

D3

D4

A1

A2

B1

B2

A3

A4

B3

B4

C1

C2

D1

D2

C3

C4

D3

D4

ADDPS

SHUFPS

SHUFPS

ADDPS

ADDPS

M O VLHPS

M O VLHPS

xm m 0

xm m 2

M O VHLPS

M O VHLPS

xm m 1

xm m 3

Advertising