User/source coding rules, User/source coding rules -97 – Intel ARCHITECTURE IA-32 User Manual

Page 169

Advertising
background image

General Optimization Guidelines

2

2-97

User/Source Coding Rules

User/Source Coding Rule 1. (M impact, L generality) If an indirect branch
has two or more common taken targets, and at least one of those targets are
correlated with branch history leading up to the branch, then convert the
indirect branch into a tree where one or more indirect branches are preceded
by conditional branches to those targets. Apply this “peeling” procedure to the
common target of an indirect branch that correlates to branch history. 2-24

User/Source Coding Rule 2. (H impact, M generality) Pad data structures
defined in the source code so that every data element is aligned to a natural
operand size address boundary. If the operands are packed in a SIMD
instruction, align to the packed element size (64- or 128-bit). 2-39

User/Source Coding Rule 3. (M impact, L generality) Beware of false
sharing within a cache line (64 bytes) for both Pentium 4, Intel Xeon, and
Pentium M processors; and within a sector of 128 bytes on Pentium 4 and Intel
Xeon processors. 2-42

User/Source Coding Rule 4. (H impact, ML generality) Consider using a
special memory allocation library to avoid aliasing. 2-46

User/Source Coding Rule 5. (M impact, M generality) When padding
variable declarations to avoid aliasing, the greatest benefit comes from
avoiding aliasing on second-level cache lines, suggesting an offset of 128 bytes
or more. 2-46

User/Source Coding Rule 6. (H impact, H generality) Optimization
techniques such as blocking, loop interchange, loop skewing and packing are
best done by the compiler. Optimize data structures to either fit in one-half of
the first-level cache or in the second-level cache; turn on loop optimizations
in the compiler to enhance locality for nested loops. 2-52

User/Source Coding Rule 7. (M impact, ML generality) If there is a blend
of reads and writes on the bus, changing the code to separate these bus
transactions into read phases and write phases can help performance. Note,
however, that the order of read and write operations on the bus are not the
same as they appear in the program. 2-52

Advertising