Deterministic cache parameters, Deterministic cache parameters -53 – Intel ARCHITECTURE IA-32 User Manual

Page 343

Advertising
background image

Optimizing Cache Usage

6

6-53

The baseline for performance comparison is the throughput (bytes/sec)
of 8-MByte region memory copy on a first-generation Pentium M
processor (CPUID signature 0x69n) with a 400-MHz system bus using
byte-sequential technique similar to that shown in Example 6-10. The
degree of improvement relative to the performance baseline for newer
IA-32 processors and platforms with higher system bus speed using
different coding techniques are compared.

The second coding technique moves data at 4-Byte granularity using
REP string instruction. The third column compares the performance of
the coding technique listed in Example 6-11. The fourth column of
performance compares the throughput of fetching 4-KBytes of data at a
time (using hardware prefetch to aggregate bus read transactions) and
writing to memory via 16-Byte streaming stores.

Increases in bus speed is the primary contributor to throughput
improvements. The technique shown in Example 6-12 will likely take
advantage of the faster bus speed in the platform more efficiently.
Additionally, increasing the block size to multiples of 4-KBytes while
keeping the total working set within the second-level cache can improve
the throughput slightly.

The relative performance figure shown in Table 6-2 is representative of
clean microarchitectual conditions within a processor (e.g. looping s
simple sequence of code many times). The net benefit of integrating a
specific memory copy routine into an application (full-featured
applications tend to create many complicated micro-architectural
conditions) will vary for each application.

Deterministic Cache Parameters

If CPUID support the function leaf with input EAX = 4, this is referred
to as the deterministic cache parameter leaf of CPUID (see CPUID
instruction in IA-32 Intel® Architecture Software Developer’s Manual,
Volume 2A
). Software can
use the deterministic cache parameter leaf to

Advertising