Example 3-10 – Intel ARCHITECTURE IA-32 User Manual

Page 196

Advertising

IA-32 Intel® Architecture Optimization

3-16

SIMD Extensions 2 integer SIMD and

__m128d

is used for double

precision floating-point SIMD. These types enable the programmer to
choose the implementation of an algorithm directly, while allowing the
compiler to perform register allocation and instruction scheduling where
possible. These intrinsics are portable among all Intel architecture-based
processors supported by a compiler. The use of intrinsics allows you to
obtain performance close to the levels achievable with assembly. The
cost of writing and maintaining programs with intrinsics is considerably
less. For a detailed description of the intrinsics and their use, refer to the
Intel® C++ Compiler User’s Guide.

Example 3-10 shows the loop from Example 3-8 using intrinsics.

The intrinsics map one-to-one with actual Streaming SIMD Extensions
assembly code. The

xmmintrin.h

header file in which the prototypes

for the intrinsics are defined is part of the Intel C++ Compiler included
with the VTune Performance Enhancement Environment CD.

Intrinsics are also defined for the MMX technology ISA. These are
based on the

__m64

data type to represent the contents of an

You can specify values in bytes, short integers, 32-bit values, or as a
64-bit object.

Example 3-10 Simple Four-Iteration Loop Coded with Intrinsics

#include <xmmintrin.h>

void add(float *a, float *b, float *c)

{

__m128 t0, t1;

t0 = _mm_load_ps(a);

t1 = _mm_load_ps(b);

t0 = _mm_add_ps(t0, t1);

_mm_store_ps(c,

t0);

}

Advertising