Vertical versus horizontal computation, Vertical versus horizontal computation -5, Figure 5-1 – Intel ARCHITECTURE IA-32 User Manual

Page 267: Homogeneous operation on parallel data elements -5

Advertising
background image

Optimizing for SIMD Floating-point Applications

5

5-5

For some applications, e.g., 3D geometry, the traditional data
arrangement requires some changes to fully utilize the SIMD registers
and parallel techniques. Traditionally, the data layout has been an array
of structures (AoS). To fully utilize the SIMD registers in such
applications, a new data layout has been proposed—a structure of arrays
(SoA) resulting in more optimized performance.

Vertical versus Horizontal Computation

The majority of the floating-point arithmetic instructions in SSE and
SSE2 are focused on vertical data processing of parallel data elements,
i.e., the destination of each element is the result of a common arithmetic
operation of the input operands in the same vertical position. This is
shown in the diagram below. To supplement these homogeneous
arithmetic operations on parallel data elements, SSE and SSE2 also
provides several data movement instruction (e.g., shufps) to facilitate
moving data elements horizontally.

The AoS data structure is often used in 3D geometry computations.
SIMD technology can be applied to AoS data structure using a
horizontal computation model. This means that the

x

,

y

,

z

, and

w

components of a single vertex structure (that is, of a single vector

Figure 5-1

Homogeneous Operation on Parallel Data Elements

X3

X2

X1

X0

Y3

Y2

Y1

Y0

X3 OP Y3

X2 OP Y2

X1 OP Y1

X0 OP Y0

OP

OP

OP

OP

Advertising