Table 4 . instruction stall time, An253 – Cirrus Logic AN253 User Manual
Page 6

AN253
6
Avoid single or double floating-point division. These operations are not implemented in the Maverick-
Crunch and will be carried out by a soft-float library routine. When possible, calculate the reciprocal during
compile-time and multiply instead. Be vigilant of compounding rounding errors from calculating the recip-
rocal, and repeated multiplication of the reciprocal.
Avoid creating data dependencies in algorithms when performing MaverickCrunch operations. A
data dependency occurs when an instruction takes the output of a previous instruction as its operand. This
dependency will stall the pipeline if the output of the previous instruction is not available for the current
instruction. The following table contains the MaverickCrunch's instruction stall time in cycle counts for
each type of coprocessor instruction.
Table 4. Instruction Stall Time
Consider the following MaverickCrunch code:
fmuls
c3, c1, c2 // CDP instruction
fadds c0, c0, c3 // CDP instruction - stalls on c3
In this example, the addition operation stalls for 5 cycles on the product (
c3
) of the multiplication. However,
if the first instruction had been a double precision multiplication the addition operation would have stalled
on the product (
c3
) for 11 cycles.
Considering the pipeline stall cycles, the above source code looks like:
fmuls
c3, c1, c2 // CDP instruction
<stall cycle>
<stall cycle>
<stall cycle>
<stall cycle>
<stall cycle>
fadds c0, c0, c3 // CDP instruction - stalls on c3
Optimize data-dependent pipeline stalls by interleaving the dependent instructions with indepen-
dent instructions. This will have a positive effect by increasing pipeline throughput. Again, this is espe-
cially important when executing a double precision multiply, and significantly important when executing
adds, compares, or other data-path operations. These interleaved instructions may either be ARM or Mav-
erickCrunch operations and should be placed just after the stalling instruction. However, be judicious
about which interleaved instructions are used so that new data dependencies are not introduced into the
source code.
Finally, the optimized example source code looks like:
INSTRUCTION TYPE
CYCLE COUNT
CDP
5
CDP (Multiply Double & 64)
11
LDC/MCR
2
STR/MRC
0