Assembly, Intrinsics, Assembly -15 intrinsics -15 – Intel ARCHITECTURE IA-32 User Manual

Page 195: Example 3-9

Advertising
background image

Coding for SIMD Architectures

3

3-15

Assembly

Key loops can be coded directly in assembly language using an
assembler or by using inlined assembly (C-asm) in C/C++ code. The
Intel compiler or assembler recognize the new instructions and registers,
then directly generate the corresponding code. This model offers the
opportunity for attaining greatest performance, but this performance is
not portable across the different processor architectures.

Example 3-9 shows the Streaming SIMD Extensions inlined assembly
encoding.

Intrinsics

Intrinsics provide the access to the ISA functionality using C/C++ style
coding instead of assembly language. Intel has defined three sets of
intrinsic functions that are implemented in the Intel

®

C++ Compiler to

support the MMX technology, Streaming SIMD Extensions and
Streaming SIMD Extensions 2. Four new C data types, representing
64-bit and 128-bit objects are used as the operands of these intrinsic
functions.

__m64

is used for MMX integer SIMD,

__m128

is used for

single-precision floating-point SIMD,

__m128i

is used for Streaming

Example 3-9

Streaming SIMD Extensions Using Inlined Assembly Encoding

void add(float *a, float *b, float *c)

{

__asm {

mov eax, a

mov edx, b

mov ecx, c

movaps xmm0, XMMWORD PTR [eax]

addps xmm0, XMMWORD PTR [edx]

movaps XMMWORD PTR [ecx], xmm0

}

}

Advertising