gko::factorization::ParIc#

Parallel IC(0). Computes the incomplete Cholesky factor by iterating an asynchronous fixed-point update on \(L\) — the algorithm of Chow & Patel — and converges to a factorisation that matches \(A\) on its sparsity pattern. Designed for fine-grained parallelism on GPUs and many-core CPUs.

template<typename ValueType = default_precision, typename IndexType = int32>
class ParIc #

Inherits from

ParIC is an incomplete Cholesky factorization which is computed in parallel.

\(L\) is a lower triangular matrix, which approximates a given matrix \(A\) with \(A \approx LL^H\). Here, \(L + L^H\) has the same sparsity pattern as \(A\), which is also called IC(0).

The ParIC algorithm generates the incomplete factor iteratively, using a fixed-point iteration of the form

\[\begin{split} F(L)_{ij} = \begin{cases} \sqrt{a_{ii} - \sum_{k=1}^{i-1} |l_{ik}|^2}, & i = j, \\ a_{ij} - \sum_{k=1}^{i-1} l_{ik}\, \overline{l_{jk}}, & i < j. \end{cases} \end{split}\]

In general, the entries of \(L\) can be iterated in parallel and in asynchronous fashion; the algorithm asymptotically converges to incomplete factors \(L\) and \(L^H\) fulfilling \( (R = A - L L^H)\vert_\mathcal{S} = 0\vert_\mathcal{S} \) where \(\mathcal{S}\) is the pre-defined sparsity pattern (in case of IC(0), the sparsity pattern of the system matrix \(A\)). The number of ParIC sweeps needed for convergence depends on the parallelism level: for sequential execution, a single sweep is sufficient; for fine-grained parallelism, the number of sweeps necessary to get a good approximation of the incomplete factors depends heavily on the problem. On the OpenMP executor, 3 sweeps usually give a decent approximation in our experiments, while GPU executors can take 10 or more iterations.

References

  • Chow, E., Patel, A. Fine-Grained Parallel Incomplete LU Factorization. SIAM Journal on Scientific Computing, 37 (2), C169–C193, 2015. https://doi.org/10.1137/140968896

Template Parameters:
  • ValueType – Type of the values of all matrices used in this class

  • IndexType – Type of the indices of all matrices used in this class

Public Static Functions

static parameters_type parse(
const config::pnode &config,
const config::registry &context,
const config::type_descriptor &td_for_child = config::make_type_descriptor<ValueType, IndexType>(),
)#

Create the parameters from the property_tree. Because this is directly tied to the specific type, the value/index type settings within config are ignored and type_descriptor is only used for children configs.

Parameters:
  • config – the property tree for setting

  • context – the registry

  • td_for_child – the type descriptor for children configs. The default uses the value/index type of this class.

Returns:

parameters

struct parameters_type#

Public Members

size_type iterations#

The number of iterations the compute kernel will use when doing the factorization. The default value 0 means Auto, so the implementation decides on the actual value depending on the resources that are available.

bool skip_sorting#

The system_matrix, which will be given to this factory, must be sorted (first by row, then by column) in order for the algorithm to work. If it is known that the matrix will be sorted, this parameter can be set to true to skip the sorting (therefore, shortening the runtime). However, if it is unknown or if the matrix is known to be not sorted, it must remain false, otherwise, the factorization might be incorrect.

std::shared_ptr<typename matrix_type::strategy_type> l_strategy#

Strategy which will be used by the L matrix. The default value nullptr will result in the strategy classical.

bool both_factors#

true will generate both L and L^H, false will only generate the L factor, resulting in a Composition of only a single LinOp. This can be used to avoid the transposition operation.