Instruction selection, Instruction selection -37, Example 3-21 – Intel ARCHITECTURE IA-32 User Manual

Page 217: Emulation of conditional moves -37

Advertising
background image

Coding for SIMD Architectures

3

3-37

As one can see, all the redundant cache misses can be eliminated by
applying this loop blocking technique. If

MAX

is huge, loop blocking can

also help reduce the penalty from DTLB (data translation look-aside
buffer) misses. In addition to improving the cache/memory
performance, this optimization technique also saves external bus
bandwidth.

Instruction Selection

The following section gives some guidelines for choosing instructions
to complete a task.

One barrier to SIMD computation can be the existence of
data-dependent branches. Conditional moves can be used to eliminate
data-dependent branches. Conditional moves can be emulated in SIMD
computation by using masked compares and logicals, as shown in
Example 3-21.

Example 3-21 Emulation of Conditional Moves

High-level code:

short A[MAX_ELEMENT], B[MAX_ELEMENT], C[MAX_ELEMENT],

D[MAX_ELEMENT], E[MAX_ELEMENT];

for (i=0; i<MAX_ELEMENT; i++) {

if (A[i] > B[i]) {

C[i] = D[i];

} else {

C[i] = E[i];

}

}

Assembly code:

xor

eax, eax

continued

Advertising