Performance comparisons of memory copy routines – Intel ARCHITECTURE IA-32 User Manual

Page 342

Advertising
background image

IA-32 Intel® Architecture Optimization

6-52

Performance Comparisons of Memory Copy Routines

The throughput of a large-region, memory copy routine depends on
several factors:

coding techniques that implements the memory copy task

characteristics of the system bus (speed, peak bandwidth, overhead
in read/write transaction protocols)

microarchitecture of the processor

A comparison of the two coding techniques discussed above and two
un-optimized techniques is shown in Table 6-2.

add esi,ecx

add edi,ecx

sub edx,ecx

jnz main_loop

sfence

}

}

Table 6-2

Relative Performance of Memory Copy Routines

Processor, CPUID
Signature and
FSB Speed

Byte
Sequential

DWORD
Sequential

SW prefetch +
8 byte
streaming
store

4KB-Block
HW prefetch
+ 16 byte
streaming
stores

Pentium M processor,
0x6Dn, 400

1.3X

1.2X

1.6X

2.5X

Intel Core Solo and
Intel Core Duo
processors, 0x6En,
667

3.3X

3.5X

2.1X

4.7X

Pentium D processor,
0xF4n, 800

3.4X

3.3X

4.9X

5.7X

Advertising