Horizontal add using sse, Horizontal add using sse -18, Example 5-8 – Intel ARCHITECTURE IA-32 User Manual

Page 280

Advertising
background image

IA-32 Intel® Architecture Optimization

5-18

Example 5-8 illustrates how to use MMX technology code for copying
or shuffling.

Horizontal ADD Using SSE

Although vertical computations use the SIMD performance better than
horizontal computations do, in some cases, the code must use a
horizontal operation. The

movlhps

/

movhlps

and shuffle can be used to

sum data horizontally. For example, starting with four 128-bit registers,
to sum up each register horizontally while having the final results in one
register, use the

movlhps

/

movhlps

instructions to align the upper and

lower parts of each register. This allows you to use a vertical add. With
the resulting partial horizontal summation, full summation follows
easily. Figure 5-3 schematically presents horizontal add using
movhlps/movlhps, while Example 5-9 and Example 5-10 provide the
code for this operation.

Example 5-8

Using MMX Technology Code for Copying or Shuffling

movq

mm0, [Uarray+ebx]

; mm0= u1 u2

movq

mm1, [Varray+ebx]

; mm1= v1 v2

movq

mm2, mm0

; mm2= u1 u2

punpckhdq

mm0, mm1

; mm0= u1 v1

punpckldq

mm2, mm1

; mm2= u2 v2

movq

[Coords+edx], mm0

; store u1 v1

movq

[Coords+8+edx], mm2

; store u2 v2

movq

mm4, [Uarray+8+ebx]

; mm4= u3 u4

movq

mm5, [Varray+8+ebx]

; mm5= v3 v4

movq

mm6, mm4

; mm6= u3 u4

punpckhdq

mm4, mm5

; mm4= u3 v3

punpckldq

mm6, mm5

; mm6= u4 v4

movq

[Coords+16+edx], mm4

; store u3 v3

movq

[Coords+24+edx], mm6

; store u4 v4

Advertising