Branch prediction, Eliminating branches, Branch prediction -15 – Intel ARCHITECTURE IA-32 User Manual

Page 87: Eliminating branches -15

Advertising
background image

General Optimization Guidelines

2

2-15

Branch Prediction

Branch optimizations have a significant impact on performance. By
understanding the flow of branches and improving the predictability of
branches, you can increase the speed of code significantly.

Optimizations that help branch prediction are:

Keep code and data on separate pages (a very important item, see
more details in the “Memory Accesses” section).

Whenever possible, eliminate branches.

Arrange code to be consistent with the static branch prediction
algorithm.

Use the

pause

instruction in spin-wait loops.

Inline functions and pair up calls and returns.

Unroll as necessary so that repeatedly-executed loops have sixteen
or fewer iterations, unless this causes an excessive code size
increase.

Separate branches so that they occur no more frequently than every
three

μ

ops where possible.

Eliminating Branches

Eliminating branches improves performance because it:

reduces the possibility of mispredictions

reduces the number of required branch target buffer (BTB) entries;
conditional branches, which are never taken, do not consume BTB
resources

There are four principal ways of eliminating branches:

arrange code to make basic blocks contiguous

unroll loops, as discussed in the “Loop Unrolling” section

use the

cmov

instruction

use the

setcc

instruction

Advertising