Non-temporal store bus traffic, Non-temporal store bus traffic -53 – Intel ARCHITECTURE IA-32 User Manual

Page 125

Advertising
background image

General Optimization Guidelines

2

2-53

User/Source Coding Rule 8. (H impact, H generality) To achieve effective
amortization of bus latency, software should pay attention to favor data access
patterns that result in higher concentrations of cache miss patterns with cache
miss strides that are significantly smaller than half of the hardware prefetch
trigger threshold.

Non-Temporal Store Bus Traffic

Peak system bus bandwidth is shared by several types of bus activities,
including: reads (from memory), read for ownership (of a cache line),
and writes. The data transfer rate for bus write transactions is higher if
64 bytes are written out to the bus at a time.

Typically, bus writes to Writeback (WB) type memory must share the
system bus bandwidth with read-for-ownership (RFO) traffic.
Non-temporal stores do not require RFO traffic; they do require care in
managing the access patterns in order to ensure 64 bytes are evicted at
once (rather than evicting several 8 byte chunks).

Although full 64-byte bus writes due to non-temporal stores have data
bandwidth that is twice that of bus writes to WB memory, transferring
8-byte chunks wastes bus request bandwidth and delivers significantly
lower data bandwidth.

Advertising