11 submitting dma transfer requests, 12 device independent dma optimization guideline, Guideline – Texas Instruments TMS320 DSP User Manual

Page 68: Requests

Advertising
background image

www.ti.com

6.11 Submitting DMA Transfer Requests

6.12 Device Independent DMA Optimization Guideline

Submitting DMA Transfer Requests

The specification of the ACPY2 interface strives to perform a delicate trade-off between allowing high
performance and requiring error checking at run time. Optimized algorithms require high speed transfer
mechanisms and invariably use aligned addresses and 32 or 16-bit element sizes as their fundamental
type of data transfer. At the other end of the spectrum, are algorithms that need a DMA library to perform
the transfer of the required number of bytes from any sources address to any destination address without
being any more complicated than a simple memory copy (memcpy) function in the C standard library.

The ACPY2 interface provides algorithm developers two interface functions to submit DMA transfer
requests:

ACPY2_start()

and

ACPY2_startAligned()

. The only operational difference between

ACPY2_startAligned()

and

ACPY2_start()

is the additional requirement by

ACPY2_startAligned()

for

its source and destination addresses to be properly aligned with respect to the configured element size.
When using 32-bit transfer mode, these addresses must be at least 32-bit aligned. For 16-bit transfers,
16-bit alignment is required. When called with properly aligned addresses, both functions implement an
identical behavior. However, in architectures, such as C6000, which permit DMA transfers using 8-bit or
16-bit alignment of source or destination addresses irrespective of the actual transfer element size, the

ACPY2_startAligned()

function can be optimized to operate more efficiently. On the other hand, certain

architectures, such as C55x, may impose device-dependent DMA rules that require stricture alignment of
the source and destination addresses for all transfers and therefore may provide the same implementation
for both APIs.

ACPY2_start()

makes no assumptions on the alignment of the source and destination addresses. It

accepts addresses at any alignment and when allowed by the architecture, adjusts the transfer
parameters (including element size, number of elements, transfer type) to transparently perform the
desired transfer using the given alignment. It is intended to simplify algorithm development in the initial
states.

ACPY2_start()

thus strives to maintain simplicity while maintaining reasonable levels of

performance. The

ACPY2_startAligned()

API, on the other hand, makes no run-time checks on the

alignment and performs the transfer using the configured transfer settings of the channel. Passing source
or destination addresses with incorrect alignment, with respect to the configured element size of the DMA
handle, will result in unspecified behavior. In this respect, the sole aim of

ACPY2_startAligned()

is to

guarantee performance by eliminating run-time checks by a pre-negotiated contract with the algorithm
developer.

In this section, we outline a general guideline applicable to all architectures that may result in significant
performance optimizations. The basic premise is that configuring a logical channel is an expensive
operation in terms of cycles, even when compared to the standard ACPY2 scheduling and synchronization
APIs. Therein lies the motivation for the following new guideline:

DMA Guideline 2

All algorithms should minimize channel (re)configuration overhead by requesting a dedicated logical
DMA channel for each distinct type of DMA transfer it issues, and avoid calling ACPY2_configure and
use the new fast configuration APIs where possible.

DMA Guideline 2 is useful when different types of DMA transfers are needed in a critical loop of an
algorithm. By defining different IDMA2 logical channels for each transfer type,

ACPY2_configure()

can be

called on each channel at the beginning of the algorithm code. Then, transfer requests can be rapidly
submitted on these preconfigured channels in the critical loop using the new

ACPY2_start()

or

ACPY2_startAligned()

function.

In the next two sections, we present additional DMA rules and guidelines specific to C5000 or C6000
architectures.

Use of the DMA Resource

68

SPRU352G – June 2005 – Revised February 2007

Submit Documentation Feedback

Advertising