Altera Floating-Point User Manual

Page 28

Advertising
background image

Figure 2-2: Cholesky Decomposition Function Top-level Diagram

Although the Cholesky decomposition algorithm only operates on the lower triangular matrix, the core

requires the entire matrix to be loaded, during which the processing or vector memory is initialized.
The FPC datapath is split into two sections. The first section, also known as the vector section, takes the

inner product of two vectors and subtracts it from the input matrix element, a

ij

. The second section, also

known as the root section, calculates square roots and performs division by the square root. The first

element is loaded into both inputs of the root section and the outcome is its own square root. The first

element continues to stay latched in the left input field of the root section while all the other elements of

the first column are loaded into the right input field. The resulting output is the value of the respective

column element divided by the value of the first element of the Cholesky decomposition matrix.
During processing, two rows from the processing matrix are loaded. For the first element in each new

column, both rows have the same index; hence contain the same values. The first row is latched into the

input register of the vector section. For the rest of the column, the row index is increased, and a new a

ij

element and triangular matrix vector, L

j

is loaded. The first result out of the vector section is latched onto

the left register of the root section. All results from the column, including the first result, are loaded into

the right register of the root section. The root section generates the square root of the first vector result,

while for the other results coming from the vector section, the number is divided by the square root of the

first result.
All calculated values are written to another memory block for further processing. The first column values

are output singly during preprocessing, while the values of other columns are burst out during processing.
There are only minor differences between the architectures for real and complex matrices. For the

complex matrix, both the input and processing memory blocks contain complex values. Similarly, all

values going into the vector section are complex numbers. The complex conjugate of the latched register

is obtained by simply inverting the sign bit. As for the root section, the structure is simplified by the

nature of the positive definite matrix. The diagonal value, which is the first value at the top of each

column in the decomposition, is always a real number so that the result from the inverse square root

calculation is always a real number. The complex multiplier in the root section is therefore a real scalar, so

only two real multipliers are required.

2-4

Cholesky Decomposition Function

UG-01058

2014.12.19

Altera Corporation

ALTERA_FP_MATRIX_INV IP Core

Send Feedback

Advertising