The non-temporal store instructions, Fencing, Streaming non-temporal stores – Intel ARCHITECTURE IA-32 User Manual

Page 300: The non-temporal store instructions -10, Fencing -10 streaming non-temporal stores -10

Advertising
background image

IA-32 Intel® Architecture Optimization

6-10

The Non-temporal Store Instructions

This section describes the behavior of streaming stores and reiterates
some of the information presented in the previous section. In Streaming
SIMD Extensions, the

movntps

,

movntpd, movntq

,

movntdq, movnti,

maskmovq

and

maskmovdqu

instructions are streaming, non-temporal

stores. With regard to memory characteristics and ordering, they are
similar mostly to the Write-Combining (

WC

) memory type:

Write combining – successive writes to the same cache line are
combined

Write collapsing – successive writes to the same byte(s) result in
only the last write being visible

Weakly ordered – no ordering is preserved between

WC

stores, or

between

WC

stores and other loads or stores

Uncacheable and not write-allocating – stored data is written around
the cache and will not generate a read-for-ownership bus request for
the corresponding cache line

Fencing

Because streaming stores are weakly ordered, a fencing operation is
required to ensure that the stored data is flushed from the processor to
memory. Failure to use an appropriate fence may result in data being
“trapped” within the processor and will prevent visibility of this data by
other processors or system agents. WC stores require software to ensure
coherence of data by performing the fencing operation; see “The fence
Instructions” section for
more information.

Streaming Non-temporal Stores

Streaming stores can improve performance in the following ways:

Increase store bandwidth if 64 bytes that fit within a cache line are
written consecutively, since they do not require read-for-ownership
bus requests and 64 bytes are combined into a single bus write
transaction.

Advertising