Performance depends on memory access time, Performance depends on memory access time –12 – Altera Nios II C2H Compiler User Manual

Page 18

Advertising
background image

1–12

9.1

Altera Corporation

Nios II C2H Compiler User Guide

November 2009

C2H Compiler Concepts

Subfunctions called within an accelerated function are also
converted to hardware using the same C-to-hardware mapping
rules. The C2H Compiler creates only one hardware instance of the
subfunction, regardless of how many times the subfunction is called
within the top-level function. Isolating accelerated C code into a
subfunction provides a method of creating a shared hardware
resource within an accelerator.

The C2H Compiler performs certain optimizations when it can reduce
logic utilization based on resource sharing.

Refer to

Chapter 3, C-to-Hardware Mapping Reference

for complete

details of the C2H Compiler mappings.

Performance Depends on Memory Access Time

Applications that run on a processor are typically compute-bound, which
means the performance bottleneck depends on the rate the processor
executes instructions. Memory access time affects the execution time, but
instruction and data caches minimize the time the processor waits for
memory accesses.

With C2H hardware accelerators, the performance bottleneck undergoes
a profound change: Applications typically become memory bound,
which means the performance bottleneck depends on the memory
latency and bandwidth. When multiple operations do not have data
dependencies that require them to execute sequentially, the
C2H Compiler schedules them in parallel. The resulting accelerator logic
often must access memory to feed data to each parallel operation. If the
hardware does not have fast access to memory, the hardware stalls
waiting for data, reducing the performance and efficiency.

Achieving maximum performance from a hardware accelerator often
involves examining your system's memory topology and data flow, and
making modifications to reduce or eliminate memory bottlenecks. For
example, if your C code randomly accesses a large buffer of data stored in
slow SDRAM, performance suffers due to constant bank switching in
SDRAM. You can alleviate this bottleneck by first copying blocks of data
to an on-chip RAM, and allowing the accelerator to access this fast, low-
latency RAM. Note that you can also accelerate the copy operation, which
creates a direct memory access (DMA) hardware accelerator.

Advertising