Optimize branch predictability, Optimize memory access – Intel ARCHITECTURE IA-32 User Manual

Page 77

Advertising
background image

General Optimization Guidelines

2

2-5

Optimize Branch Predictability

Improve branch predictability and optimize instruction prefetching
by arranging code to be consistent with the static branch prediction
assumption: backward taken and forward not taken.

Avoid mixing near calls, far calls and returns.

Avoid implementing a call by pushing the return address and
jumping to the target. The hardware can pair up call and return
instructions to enhance predictability.

Use the

pause

instruction in spin-wait loops.

Inline functions according to coding recommendations.

Whenever possible, eliminate branches.

Avoid indirect calls.

Optimize Memory Access

Observe store-forwarding constraints.

Ensure proper data alignment to prevent data split across cache line.
boundary. This includes stack and passing parameters.

Avoid mixing code and data (self-modifying code).

Choose data types carefully (see next bullet below) and avoid type
casting.

Employ data structure layout optimization to ensure efficient use of
64-byte cache line size.

Favor parallel data access to mask latency over data accesses with
dependency that expose latency.

For cache-miss data traffic, favor smaller cache-miss strides to
avoid frequent DTLB misses.

Use prefetching appropriately.

Use the following techniques to enhance locality: blocking,
hardware-friendly tiling, loop interchange, loop skewing.

Advertising