Intel ARCHITECTURE IA-32 User Manual

Page 565

Advertising
background image

Index

Index-5

O

optimizing cache utilization

cache management, 6-44
examples, 6-15
non-temporal store instructions, 6-10
prefetch and load, 6-9
prefetch Instructions, 6-8
prefetching, 6-7
SFENCE instruction, 6-15, 6-16
streaming, non-temporal stores, 6-10

optimizing floating-point applications

copying, shuffling, 5-17
data arrangement, 5-4
data deswizzling, 5-14
data swizzling using intrinsics, 5-12
horizontal ADD, 5-18
planning considerations, 5-2
rules and suggestions, 5-1
scalar code, 5-3
vertical versus horizontal computation, 5-5

optimizing floating-point code, 2-58

P

pack instruction, 4-10

pack instructions, 4-8

packed average byte or word), 4-31

packed multiply high unsigned, 4-30

packed shuffle word, 4-18

packed signed integer word maximum, 4-29

packed sum of absolute differences, 4-30

parallelism, 3-12, E-7

parameter alignment, D-4

partial memory accesses, 4-35

PAVGB instruction, 4-31

PAVGW instruction, 4-31

Pentium Processor Extreme Edition, 1-39

Performance and Usage Models

Multithreading, 7-2
Performance and Usage Models, 7-2

Performance Library Suite, A-14

optimizations, A-16

PEXTRW instruction, 4-13

PGO. See profile-guided optimization

PINSRW instruction, 4-14

PMINSW instruction, 4-29

PMINUB instruction, 4-30

PMOVMSKB instruction, 4-16

PMULHUW instruction, 4-30

predictable memory access patterns, 6-7

prefetch and cacheability Instructions, 6-4

prefetch and load Instructions, 6-8

prefetch concatenation, 6-26, 6-28

prefetch instruction, 6-1

prefetch instruction considerations, 6-24

cache blocking techniques, 6-34
concatenation, 6-26
minimizing prefetches number, 6-29
no preloading or prefetch, E-6
prefetch scheduling distance, E-5
scheduling distance, 6-25
single-pass execution, 6-3, 6-41
spread prefetch with computation

instructions, 6-32

strip-mining, 6-37

prefetch instructions, 6-7

prefetch scheduling distance, 6-25, E-5, E-7,

E-10

prefetch use

predictable memory access patterns, 6-7
time-consuming innermost loops, 6-7

prefetching concept, 6-6

prefetchnta instruction, 6-36

profile-guided optimization, A-7

prolog sequences, 2-90

PSADBW instruction, 4-30

PSHUF instruction, 4-18

P-states, 9-1

Advertising