Cpu binding considerations, Single tcp connection performance settings – Dell Emulex Family of Adapters User Manual

Page 661

Advertising
background image

Emulex Drivers for Windows User Manual

P010077-01A Rev. A

3. Configuration

NIC Driver Configuration

661

Some applications run slower with interrupt coalescing enabled, such as applications

that depend on the completion of the current network transfer before they post

additional work. If an application sends and receives one network message before

posting the next message, it is considered latency bound. For latency bound

applications, an interrupt is required to proceed to the next work item, so reducing the

number of interrupts directly reduces the network throughput. The Microsoft iSCSI

Initiator is generally considered a latency bound application unless the I/O sizes are

very large.
When tuning the system, you must balance the extra CPU usage caused by interrupts

with the potential decrease in total throughput for latency bound applications.

CPU Binding Considerations

Windows applications may set a processor affinity, which binds a program to a

particular CPU in a multiple processor computer. However, with the recent additions

to the Windows networking stack, manually configuring CPU affinity is not

recommended.
The advantage of application affinity for network applications is based on choosing the

ideal relationship between the DPC and application affinity to reduce processor-cache

coherency cycles. The ideal mapping may require that both the DPC and application

run on the same processor, different processors, or different cores of a dual-core

processor that share a common memory cache. Even when the best affinity relationship

is determined, it is impossible to enforce this relationship because RSS or TCP

offloading choose the DPC processor.
The driver uses multiple parallel DPCs that are explicitly assigned to particular CPUs

for processing both RSS and TCP offloading tasks. Each TCP connection is assigned to a

particular CPU for processing. This provides the advantage of assigning CPU affinities

by reducing CPU cache misses, without any user configuration.
Explicit processor affinity assignments are not necessary for the driver because the

advantages of assigning processor affinities are realized by using RSS. The only reason

to experiment with application and interrupt CPU affinity is when performing isolated

networking benchmarks.

Single TCP Connection Performance Settings

One common benchmark is to run a single TCP connection between two computers as

fast as possible. The following are a few suggestions to deliver the best possible

performance:

Use TCP window scaling with a 256 Kb or 512 Kb window. This may be

controlled with show socket applications, such as ntttcp from Microsoft.

Use send and receive buffers that are larger than 128 Kb with an efficient

application such as ntttcp.

Disable RSS and use an interrupt filter driver. Experiment with all relative CPU

affinities to find the best combination.

Disable timestamps and SACK, because the test should run without dropping

any packets.

Advertising