Example 2-16, Large and small load stalls -37 – Intel ARCHITECTURE IA-32 User Manual

Page 109

Advertising
background image

General Optimization Guidelines

2

2-37

When moving data that is smaller than 64 bits between memory
locations, 64-bit or 128-bit SIMD register moves are more efficient (if
aligned) and can be used to avoid unaligned loads. Although
floating-point registers allow the movement of 64 bits at a time, floating
point instructions should not be used for this purpose, as data may be
inadvertently modified.

As an additional example, consider the cases in Example 2-16. In the
first case (A), there is a large load after a series of small stores to the
same area of memory (beginning at memory address

mem

). The large

load will stall.

The

fld

must wait for the stores to write to memory before it can

access all the data it requires. This stall can also occur with other data
types (for example, when bytes or words are stored and then words or
doublewords are read from the same area of memory).

In the second case (Example 2-16, B), there is a series of small loads
after a large store to the same area of memory (beginning at memory
address

mem

). The small loads will stall.

The word loads must wait for the quadword store to write to memory
before they can access the data they require. This stall can also occur
with other data types (for example, when doublewords or words are
stored and then words or bytes are read from the same area of memory).
This can be avoided by moving the store as far from the loads as
possible.

Example 2-16 Large and Small Load Stalls

;A. Large load stall

mov

mem, eax

; store dword to address “mem"

mov

mem + 4, ebx

; store dword to address “mem + 4"

fld

mem

; load qword at address “mem", stalls

;B. Small Load stall

fstp mem

; store qword to address “mem"

mov bx,mem+2

; load word at address “mem + 2", stalls

mov cx,mem+4

; load word at address “mem + 4", stalls

Advertising