Using arrays to make data contiguous, Using arrays to make data contiguous -21 – Intel ARCHITECTURE IA-32 User Manual

Page 201

Advertising
background image

Coding for SIMD Architectures

3

3-21

By adding the padding variable

pad

, the structure is now 8 bytes, and if

the first element is aligned to 8 bytes (64 bits), all following elements
will also be aligned. The sample declaration follows:

typedef struct { short x,y,z; char a; char pad; }

Point;

Point pt[N];

Using Arrays to Make Data Contiguous

In the following code,

for (i=0; i<N; i++) pt[i].y *= scale;

the second dimension

y

needs to be multiplied by a scaling value. Here

the

for

loop accesses each

y

dimension in the array

pt

thus disallowing

the access to contiguous data. This can degrade the performance of the
application by increasing cache misses, by achieving poor utilization of
each cache line that is fetched, and by increasing the chance for accesses
which span multiple cache lines.

The following declaration allows you to vectorize the scaling operation
and further improve the alignment of the data access patterns:

short ptx[N], pty[N], ptz[N];

for (i=0; i<N; i++) pty[i] *= scale;

With the SIMD technology, choice of data organization becomes more
important and should be made carefully based on the operations that
will be performed on the data. In some applications, traditional data
arrangements may not lead to the maximum performance.

A simple example of this is an FIR filter. An FIR filter is effectively a
vector dot product in the length of the number of coefficient taps.

Consider the following code:

(data [ j ] *coeff [0] + data [j+1]*coeff [1]+...+data

[j+num of taps-1]*coeff [num of taps-1]),

If in the code above the filter operation of data element

i

is the vector

dot product that begins at data element

j

, then the filter operation of

data element

i+1

begins at data element

j+1

.

Advertising