Data swizzling, Data swizzling -9, Example 5-2 – Intel ARCHITECTURE IA-32 User Manual

Page 271

Advertising
background image

Optimizing for SIMD Floating-point Applications

5

5-9

Now consider the case when the data is organized as SoA. Example 5-2
demonstrates how 4 results are computed for 5 instructions.

For the most efficient use of the four component-wide registers,
reorganizing the data into the SoA format yields increased throughput
and hence much better performance for the instructions used.

As can be seen from this simple example, vertical computation yielded
100% use of the available SIMD registers and produced 4 results. (The
results may vary based on the application.) If the data structures must be
in a format that is not “friendly” to vertical computation, it can be
rearranged “on the fly” to achieve full utilization of the SIMD registers.
This operation is referred to as “swizzling” operation and the reverse
operation is referred to as “deswizzling.”

Data Swizzling

Swizzling data from one format to another may be required in many
algorithms when the available instruction set extension is limited (e.g.,
only SSE is available). An example of this is AoS format, where the
vertices come as

xyz

adjacent coordinates. Rearranging them into SoA

format,

xxxx

,

yyyy

,

zzzz

, allows more efficient SIMD computations.

For efficient data shuffling and swizzling use the following instructions:

movlps

,

movhps

load/store and move data on half sections of the

registers

shufps

,

unpackhps

, and

unpacklps

unpack data

Example 5-2

Pseudocode for Vertical (xxxx, yyyy, zzzz, SoA) Computation

mulps

; x*x' for all 4 x-components of 4 vertices

mulps

; y*y' for all 4 y-components of 4 vertices

mulps

; z*z' for all 4 z-components of 4 vertices

addps

; x*x' + y*y'

addps

; x*x'+y*y'+z*z'

Advertising