Intel ARCHITECTURE IA-32 User Manual

Page 259

Advertising

Optimizing for SIMD Integer Applications

4-39

Increasing Bandwidth of Memory Fills and Video Fills

It is beneficial to understand how memory is accessed and filled. A
memory-to-memory fill (for example a memory-to-video fill) is defined
as a 64-byte (cache line) load from memory which is immediately stored
back to memory (such as a video frame buffer). The following are
guidelines for obtaining higher bandwidth and shorter latencies for
sequential memory fills (video fills). These recommendations are
relevant for all Intel architecture processors with MMX technology and
refer to cases in which the loads and stores do not hit in the first- or
second-level cache.

Increasing Memory Bandwidth Using the MOVDQ
Instruction

Loading any size data operand will cause an entire cache line to be
loaded into the cache hierarchy. Thus any size load looks more or less
the same from a memory bandwidth perspective. However, using many
smaller loads consumes more microarchitectural resources than fewer
larger stores. Consuming too many of these resources can cause the
processor to stall and reduce the bandwidth that the processor can
request of the memory subsystem.

Using

movdq

to store the data back to UC memory (or WC memory in

some cases) instead of using 32-bit stores (for example,

movd

) will

reduce by three-quarters the number of stores per memory fill cycle. As
a result, using the

movdq

instruction in memory fill cycles can achieve

significantly higher effective bandwidth than using the

movd

instruction.

Increasing Memory Bandwidth by Loading and Storing to
and from the Same DRAM Page

DRAM is divided into pages, which are not the same as operating
system (OS) pages. The size of a DRAM page is a function of the total
size of the DRAM and the organization of the DRAM. Page sizes of
several Kilobytes are common. Like OS pages, DRAM pages are
constructed of sequential addresses. Sequential memory accesses to the

Advertising