Tuning to prevent known coding pitfalls, Tuning to prevent known coding pitfalls -2 – Intel ARCHITECTURE IA-32 User Manual

Page 74

Advertising
background image

IA-32 Intel® Architecture Optimization

2-2

The following sections describe practices, tools, coding rules and
recommendations associated with these factors that will aid in
optimizing the performance on IA-32 processors.

Tuning to Prevent Known Coding Pitfalls

To produce program code that takes advantage of the Intel NetBurst
microarchitecture and the Pentium M processor microarchitecture, you
must avoid the coding pitfalls that limit the performance of the target
processor family. This section lists several known pitfalls that can limit
performance of Pentium 4 and Intel Xeon processor implementations.
Some of these pitfalls, to a lesser degree, also negatively impact
Pentium M processor performance (store-to-load-forwarding
restrictions, cache-line splits).

Table 2-1 lists coding pitfalls that cause performance degradation in
some Pentium 4 and Intel Xeon processor implementations. For every
issue, Table 2-1 references a section in this document. The section
describes in detail the causes of the penalty and presents a
recommended solution. Note that “aligned” here means that the address
of the load is aligned with respect to the address of the store.

Table 2-1

Coding Pitfalls Affecting Performance

Factors Affecting
Performance Symptom

Example
(if applicable)

Section Reference

Small, unaligned load
after large store

Store-forwarding
blocked

Example 2-12

Store Forwarding,
Store-to-Load-Forwar
ding Restriction on
Size and Alignment

Large load after small
store;

Load

dword after store

dword

, store byte;

Load dword, AND with
0xff

after store byte

Store-forwarding
blocked

Example 2-13,
Example 2-14

Store Forwarding,
Store-to-Load-Forwar
ding Restriction on
Size and Alignment

continued

Advertising