Assembly/compiler coding rules, Assembly/compiler coding rules -99 – Intel ARCHITECTURE IA-32 User Manual

Page 171

Advertising
background image

General Optimization Guidelines

2

2-99

look-up-table-based algorithm using interpolation techniques. It is possible to
improve transcendental performance with these techniques by choosing the
desired numeric precision, the size of the look-up tableland taking advantage
of the parallelism of the Streaming SIMD Extensions and the Streaming SIMD
Extensions 2 instructions. 2-59

User/Source Coding Rule 16. (H impact, ML generality) Denormalized
floating-point constants should be avoided as much as possible. 2-60

User/Source Coding Rule 17. (H impact, M generality) Use the smallest
possible floating-point or SIMD data type, to enable more parallelism with the
use of a (longer) SIMD vector. For example, use single precision instead of
double precision where possible. 2-85

User/Source Coding Rule 18. (M impact, ML generality) Arrange the
nesting of loops so that the innermost nesting level is free of inter-iteration
dependencies. Especially avoid the case where the store of data in an earlier
iteration happens lexically after the load of that data in a future iteration,
something which is called a lexically backward dependence. 2-85

User/Source Coding Rule 19. (M impact, ML generality) Avoid the use of
conditional branches inside loops and consider using SSE instructions to
eliminate branches. 2-86

User/Source Coding Rule 20. (M impact, ML generality) Keep loop
induction variables expressions simple. 2-86

Assembly/Compiler Coding Rules

Assembly/Compiler Coding Rule 1. (MH impact, H generality)
Arrange code to make basic blocks contiguous to eliminate unnecessary
branches. 2-15

Assembly/Compiler Coding Rule 2. (M impact, ML generality) Use
the

setcc

and

cmov

instructions to eliminate unpredictable conditional

branches where possible. Do not do this for predictable branches. Do not
use these instructions to eliminate all unpredictable conditional branches,
because using these instructions will incur execution overhead due to
executing both paths of a conditional branch. In addition, converting
conditional branches to

cmovs

or

setcc

trades of control flow

dependence for data dependence and restricts the capability of the out of

Advertising