Figure 1 . maverickcrunch pipelines, Code optimization for maverickcrunch, 1 algorithms – Cirrus Logic AN253 User Manual

Page 4: An253

Advertising
background image

AN253

4

Figure 1. MaverickCrunch Pipelines

3. Code Optimization for MaverickCrunch

This section describes guidelines for writing optimized code for the MaverickCrunch coprocessor. These
guidelines are divided into algorithm, compiler and architecture sections. It is assumed that the correct
algorithm has been chosen, and that all non-hardware specific optimizations have been completed. How-
ever, optimization should not begin until all of the code has been written and tested for function-
ality.

3.1

Algorithms

This section focuses on methods to reduce algorithm execution time. After the code's functionality is ver-
ified, profile and disassemble the objects. Look for and optimize the following:

-

Sections of code that are executed most frequently

-

Sections of code that take the most CPU cycles to execute

-

Inefficiencies in assembly code from the compilation

When optimizing these sections keep in mind the following general concepts of code optimization:

-

Avoid Redundancy - store computations rather than recomputing them

-

Serialize Code - code should be designed with a minimum amount of branching. Code branching
is expensive. The ARM920T does

not

support branch prediction

-

Code Locality - code executed closely together in time should be placed closely together in
memory, increasing spatial locality of reference and reducing expensive cache misses

Unless your goal is to create small code, code density is not always an indicator of code optimization.
Loop Unrolling is an optimization technique that generally increases code size, but also increases code
speed. This is because unrolled loops iterate fewer times than their unoptimized versions resulting in few-
er index calculations, comparisons and branches taken.

Note: Taken branches are expensive operations, as they take three cycles to complete and cause

the pipeline to be flushed. (There is no branch prediction in the ARM920T.)

Induction Variable Analysis is another speed optimization technique used in the case where a variable
in a loop is a function of the loop index. This variable can be updated each time the index is updated, re-
ducing the number of calculations in the loop.

F

D

E

E1

E2

E3

W

CDP

ARM MCLK

F

D

E

M

W

LDC/STC
MCR/MRC

Advertising