The clflush instruction, The clflush instruction -17 – Intel ARCHITECTURE IA-32 User Manual

Page 307

Advertising
background image

Optimizing Cache Usage

6

6-17

The clflush Instruction

The cache line associated with the linear address specified by the value
of byte address is invalidated from all levels of the processor cache
hierarchy (data and instruction). The invalidation is broadcast
throughout the coherence domain. If, at any level of the cache hierarchy,
the line is inconsistent with memory (dirty) it is written to memory
before invalidation. Other characteristics include:

The data size affected is the cache coherency size, which is 64 bytes
on Pentium 4 processor.

The memory attribute of the page containing the affected line has no
effect on the behavior of this instruction.

The

clflush

instruction can be used at all privilege levels and is

subject to all permission checking and faults associated with a byte
load.

clflush

is an unordered operation with respect to other memory traffic

including other

clflush

instructions. Software should use a

mfence

,

memory fence for cases where ordering is a concern.

As an example, consider a video usage model, wherein a video capture
device is using non-coherent AGP accesses to write a capture stream
directly to system memory. Since these non-coherent writes are not
broadcast on the processor bus, they will not flush any copies of the
same locations that reside in the processor caches. As a result, before the
processor re-reads the capture buffer, it should use

clflush

to ensure

that any stale copies of the capture buffer are flushed from the processor
caches. Due to speculative reads that may be generated by the processor,
it is important to observe appropriate fencing, using

mfence

.

Example 6-1 illustrates the pseudo-code for the recommended usage of

cflush

.

Advertising