gko::factorization::ParIlut#

Parallel threshold-based incomplete LU. The LU counterpart of ParIct: each sweep refines both the sparsity patterns of \(L\) and \(U\) and their numerical entries, alternating add-fill-in, fixed-point iteration, and drop-smallest steps. Algorithm of Anzt et al.

template<typename ValueType = default_precision, typename IndexType = int32>
class ParIlut #

Inherits from

ParILUT is an incomplete threshold-based LU factorization which is computed in parallel.

\(L\) is a lower unitriangular, while \(U\) is an upper triangular matrix, which approximate a given matrix \(A\) with \(A \approx LU\). Here, \(L\) and \(U\) have a sparsity pattern that is improved iteratively based on their element-wise magnitude. The initial sparsity pattern is chosen based on the \(ILU(0)\) factorization of \(A\).

One iteration of the ParILUT algorithm consists of the following steps:

  1. Calculate the residual \(R = A - LU\).

  2. Add new non-zero locations from \(R\) to \(L\) and \(U\). The new non-zero locations are initialised from the corresponding residual entries.

  3. Execute a fixed-point iteration on \(L\) and \(U\) according to

    \[\begin{split} F(L, U)_{ij} = \begin{cases} \frac{1}{u_{jj}} \left( a_{ij} - \sum_{k=1}^{j-1} l_{ik}\, u_{kj} \right), & i > j, \\ a_{ij} - \sum_{k=1}^{i-1} l_{ik}\, u_{kj}, & i \leq j. \end{cases} \end{split}\]

    For a more detailed description of the fixed-point iteration, see ParIlu.

  4. Remove the smallest entries (by magnitude) from \(L\) and \(U\).

  5. Execute a fixed-point iteration on the (now sparser) \(L\) and \(U\).

This ParILUT algorithm thus improves the sparsity pattern and the approximation of \(L\) and \(U\) simultaneously.

References

  • Anzt, H., Chow, E., Dongarra, J. ParILUT — A Parallel Threshold ILU for GPUs. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 231–241. https://doi.org/10.1109/IPDPS.2019.00033

Template Parameters:
  • ValueType – Type of the values of all matrices used in this class

  • IndexType – Type of the indices of all matrices used in this class

Public Static Functions

static parameters_type parse(
const config::pnode &config,
const config::registry &context,
const config::type_descriptor &td_for_child = config::make_type_descriptor<ValueType, IndexType>(),
)#

Create the parameters from the property_tree. Because this is directly tied to the specific type, the value/index type settings within config are ignored and type_descriptor is only used for children configs.

Parameters:
  • config – the property tree for setting

  • context – the registry

  • td_for_child – the type descriptor for children configs. The default uses the value/index type of this class.

Returns:

parameters

struct parameters_type#

Public Members

size_type iterations#

The number of total iterations of ParILUT that will be executed. The default value is 5.

bool skip_sorting#

true means it is known that the matrix given to this factory will be sorted first by row, then by column index, false means it is unknown or not sorted, so an additional sorting step will be performed during the factorization (it will not change the matrix given). The matrix must be sorted for this factorization to work.

The system_matrix, which will be given to this factory, must be sorted (first by row, then by column) in order for the algorithm to work. If it is known that the matrix will be sorted, this parameter can be set to true to skip the sorting (therefore, shortening the runtime). However, if it is unknown or if the matrix is known to be not sorted, it must remain false, otherwise, the factorization might be incorrect.

bool approximate_select#

true means the candidate selection will use an inexact selection algorithm. false means an exact selection algorithm will be used.

Using the approximate selection algorithm can give a significant speed-up, but may in the worst case cause the algorithm to vastly exceed its fill_in_limit. The exact selection needs more time, but more closely fulfills the fill_in_limit except for pathological cases (many candidates with equal magnitude).

The default behavior is to use approximate selection.

bool deterministic_sample#

true means the sample used for the selection algorithm will be chosen deterministically. This is only relevant when using approximate_select. It is mostly used for testing.

The selection algorithm used for approximate_select uses a small sample of the input data to determine an approximate threshold. The choice of elements can either be randomized, i.e., we may use different elements during each execution, or deterministic, i.e., the element choices are always the same.

Note that even though the threshold selection step may be made deterministic this way, the calculation of the ILU factors can still be non-deterministic due to its asynchronous iterations.

The default behavior is to use a random sample.

double fill_in_limit#

the amount of fill-in that is allowed in L and U compared to the ILU(0) factorization.

The threshold for removing candidates from the intermediate L and U is set such that the resulting sparsity pattern has at most fill_in_limit times the number of non-zeros of the ILU(0) factorization. This selection is executed separately for both factors L and U.

The default value 2.0 allows twice the number of non-zeros in L and U compared to ILU(0).

std::shared_ptr<typename matrix_type::strategy_type> l_strategy#

Strategy which will be used by the L matrix. The default value nullptr will result in the strategy classical.

std::shared_ptr<typename matrix_type::strategy_type> u_strategy#

Strategy which will be used by the U matrix. The default value nullptr will result in the strategy classical.