5 optimizing for simd floating-point applications, General rules for simd floating-point code, Chapter 5 – Intel ARCHITECTURE IA-32 User Manual

Page 263: Optimizing for simd floating-point applications, General rules for simd floating-point code -1

Advertising
background image

5-1

5

Optimizing for SIMD
Floating-point Applications

This chapter discusses general rules of optimizing for the
single-instruction, multiple-data (SIMD) floating-point instructions
available in Streaming SIMD Extensions (SSE), Streaming SIMD
Extensions 2 (SSE2)and Streaming SIMD Extensions 3 (SSE3). This
chapter also provides examples that illustrate the optimization
techniques for single-precision and double-precision SIMD
floating-point applications.

General Rules for SIMD Floating-point Code

The rules and suggestions listed in this section help optimize
floating-point code containing SIMD floating-point instructions.
Generally, it is important to understand and balance port utilization to
create efficient SIMD floating-point code. The basic rules and
suggestions include the following:

Follow all guidelines in Chapter 2 and Chapter 3.

Exceptions: mask exceptions to achieve higher performance. When
exceptions are unmasked, software performance is slower.

Utilize the flush-to-zero and denormals-are-zero modes for higher
performance to avoid the penalty of dealing with denormals and
underflows.

Incorporate the prefetch instruction where appropriate (for details,
refer to Chapter 6, “Optimizing Cache Usage”).

Use MMX technology instructions and registers if the computations
can be done in SIMD integer for shuffling data.

Advertising