Example 5-1, Figure 5-2, Dot product operation -8 – Intel ARCHITECTURE IA-32 User Manual

Page 270

Advertising
background image

IA-32 Intel® Architecture Optimization

5-8

Figure 5-2 shows how 1 result would be computed for 7 instructions if
the data were organized as AoS and using SSE alone: 4 results would
require 28 instructions.

Figure 5-2

Dot Product Operation

Example 5-1

Pseudocode for Horizontal (xyz, AoS) Computation

mulps

; x*x', y*y', z*z'

movaps

; reg->reg move, since next steps overwrite

shufps

; get b,a,d,c from a,b,c,d

addps

; get a+b,a+b,c+d,c+d

movaps

; reg->reg move

shufps

; get c+d,c+d,a+b,a+b from prior addps

addps

; get a+b+c+d,a+b+c+d,a+b+c+d,a+b+c+d

O M 15168

X

+

X

+

X

+

X

=

X1

X2

X3

X4

Fx

Fx

Fx

Fx

Y1

Y2

Y3

Y4

Fy

Fy

Fy

Fy

Z1

Z2

Z3

Z4

Fz

Fz

Fz

Fz

W 1

W 2

W 3

W 4

Fw

Fw

Fw

Fw

R1

R2

R3

R4

Advertising