Loads and stores, Loads and stores -24 – Intel ARCHITECTURE IA-32 User Manual

Page 52

Advertising
background image

IA-32 Intel® Architecture Optimization

1-24

Thus, software optimization of a data access pattern should emphasize
tuning for hardware prefetch first to favor greater proportions of
smaller-stride data accesses in the workload; before attempting to
provide hints to the processor by employing software prefetch
instructions.

Loads and Stores

The Pentium 4 processor employs the following techniques to speed up
the execution of memory operations:

speculative execution of loads

reordering of loads with respect to loads and stores

multiple outstanding misses

buffering of writes

forwarding of data from stores to dependent loads

Performance may be enhanced by not exceeding the memory issue
bandwidth and buffer resources provided by the processor. Up to one
load and one store may be issued for each cycle from a memory port
reservation station. In order to be dispatched to a reservation station,
there must be a buffer entry available for each memory operation. There
are 48 load buffers and 24 store buffers

3

. These buffers hold the µop and

address information until the operation is completed, retired, and
deallocated.

The Pentium 4 processor is designed to enable the execution of memory
operations out of order with respect to other instructions and with
respect to each other. Loads can be carried out speculatively, that is,
before all preceding branches are resolved. However, speculative loads
cannot cause page faults.

3.

Pentium 4 processors with CPUID model encoding equal to 3 have more than 24 store
buffers.

Advertising