Intel ARCHITECTURE IA-32 User Manual

Page 562

Advertising
background image

IA-32 Intel® Architecture Optimization

Index-2

coding methodologies, 3-13

coding techniques, 3-12

absolute difference of signed numbers, 4-24
absolute difference of unsigned numbers,

4-23

absolute value, 4-25
clipping to an arbitrary signed range, 4-26
clipping to an arbitrary unsigned range,

4-28

generating constants, 4-21
interleaved pack with saturation, 4-8
interleaved pack without saturation, 4-10
non-interleaved unpack, 4-11
signed unpack, 4-7
simplified clipping to an arbitrary signed

range, 4-28

unsigned unpack, 4-6

coherent requests, 6-13

command-line options, A-2

automatic processor dispatch support, A-4
floating-point arithmetic precision, A-6
inline expansion of library functions, A-6
loop unrolling, A-5
rounding control, A-6
targeting a processor, A-3
vectorizer switch, A-5

comparing register values, 2-87

compiler intrinsics

_mm_load, 6-2, 6-44
_mm_prefetch, 6-2, 6-44
_mm_stream, 6-2, 6-44

compiler plug-in, A-2

compiler-supported alignment, 3-24

complex instructions, 2-74

computation latency, E-8

computation-intensive code, 3-11

compute bound, E-7, E-8

converting code to MMX technology, 3-8

CPUID instruction, 3-2

C-states, 9-1, 9-4

D

Data

Code segment and, 2-47

data alignment, 3-20

data arrangement, 5-4

data copy, E-11

data deswizzling, 5-14, 5-15

data prefetching, 1-33

Data structures

Access pattern versus alignment, 2-40
Aligning, 2-39

data swizzling, 5-9

data swizzling using intrinsics, 5-12

decoupled memory, E-7

deeper sleep, 9-6

divide instructions, 2-76

E

eliminating branches, 2-15, 2-18

EMMS instruction, 4-3, 4-4

extract word instruction, 4-13

F

fist instruction, 2-64

fldcw instruction, 2-64

floating-point applications, 2-57

floating-point arithmetic precision options, A-6

floating-point code

improving parallelism, 2-68
loop unrolling, 2-26
memory access stall information, 2-37
memory operands, 2-71
operations with integer operands, 2-72
optimizing, 2-58
transcendental functions, 2-72

floating-point operations with integer operands,

2-72

Advertising