Intel ARCHITECTURE IA-32 User Manual

Page 5

Advertising
background image

v

Branch Prediction.................................................................................................................. 2-15

Eliminating Branches...................................................................................................... 2-15
Spin-Wait and Idle Loops................................................................................................ 2-18
Static Prediction.............................................................................................................. 2-19
Inlining, Calls and Returns ............................................................................................. 2-22
Branch Type Selection ................................................................................................... 2-23
Loop Unrolling ............................................................................................................... 2-26
Compiler Support for Branch Prediction ......................................................................... 2-28

Memory Accesses................................................................................................................. 2-29

Alignment ....................................................................................................................... 2-29
Store Forwarding ............................................................................................................ 2-32

Store-to-Load-Forwarding Restriction on Size and Alignment.................................. 2-33
Store-forwarding Restriction on Data Availability ...................................................... 2-38

Data Layout Optimizations ............................................................................................. 2-39
Stack Alignment.............................................................................................................. 2-42
Capacity Limits and Aliasing in Caches.......................................................................... 2-43

Capacity Limits in Set-Associative Caches............................................................... 2-44
Aliasing Cases in the Pentium

®

4 and Intel

®

Xeon

®

Processors ............................. 2-45

Aliasing Cases in the Pentium M Processor............................................................. 2-46

Mixing Code and Data .................................................................................................... 2-47

Self-modifying Code ................................................................................................. 2-47

Write Combining ............................................................................................................. 2-48
Locality Enhancement .................................................................................................... 2-50
Minimizing Bus Latency.................................................................................................. 2-52
Non-Temporal Store Bus Traffic ..................................................................................... 2-53
Prefetching ..................................................................................................................... 2-55

Hardware Instruction Fetching.................................................................................. 2-55
Software and Hardware Cache Line Fetching .......................................................... 2-55

Cacheability Instructions ................................................................................................ 2-56
Code Alignment .............................................................................................................. 2-57

Improving the Performance of Floating-point Applications.................................................... 2-57

Guidelines for Optimizing Floating-point Code ............................................................... 2-58
Floating-point Modes and Exceptions ............................................................................ 2-60

Floating-point Exceptions ......................................................................................... 2-60
Floating-point Modes ................................................................................................ 2-62

Improving Parallelism and the Use of FXCH .................................................................. 2-68
x87 vs. Scalar SIMD Floating-point Trade-offs ............................................................... 2-69

Scalar SSE/SSE2 Performance on Intel Core Solo and Intel Core Duo

Processors ............................................................................................................. 2-70

Memory Operands.......................................................................................................... 2-71

Advertising