Intel ARCHITECTURE IA-32 User Manual

Page 191

Advertising
background image

Coding for SIMD Architectures

3

3-11

specific optimizations. Where appropriate, the coach displays
pseudo-code to suggest the use of highly optimized intrinsics and
functions in the Intel

®

Performance Library Suite. Because VTune

analyzer is designed specifically for all of the Intel architecture
(IA)-based processors, including the Pentium 4 processor, it can offer
these detailed approaches to working with IA. See “Code Optimization
Options” in Appendix A for more details and example of a code coach
advice.

Determine If Code Benefits by Conversion to SIMD Execution

Identifying code that benefits by using SIMD technologies can be
time-consuming and difficult. Likely candidates for conversion are
applications that are highly computation intensive, such as the
following:

speech compression algorithms and filters

speech recognition algorithms

video display and capture routines

rendering routines

3D graphics (geometry)

image and video processing algorithms

spatial (3D) audio

physical modeling (graphics, CAD)

workstation applications

encryption algorithms

complex arithmetics

Generally, good candidate code is code that contains small-sized
repetitive loops that operate on sequential arrays of integers of 8, 16 or
32 bits, single-precision 32-bit floating-point data, double precision
64-bit floating-point data (integer and floating-point data items should
be sequential in memory). The repetitiveness of these loops incurs

Advertising