gko::factorization::ParIlut#
Parallel threshold-based incomplete LU. The LU counterpart of
ParIct: each sweep refines both the sparsity patterns
of \(L\) and \(U\) and their numerical entries, alternating add-fill-in,
fixed-point iteration, and drop-smallest steps. Algorithm of Anzt
et al.
-
template<typename ValueType = default_precision, typename IndexType = int32>
class ParIlut # Inherits from
public gko::Composition<default_precision>
ParILUT is an incomplete threshold-based LU factorization which is computed in parallel.
\(L\) is a lower unitriangular, while \(U\) is an upper triangular matrix, which approximate a given matrix \(A\) with \(A \approx LU\). Here, \(L\) and \(U\) have a sparsity pattern that is improved iteratively based on their element-wise magnitude. The initial sparsity pattern is chosen based on the \(ILU(0)\) factorization of \(A\).
One iteration of the ParILUT algorithm consists of the following steps:
Calculate the residual \(R = A - LU\).
Add new non-zero locations from \(R\) to \(L\) and \(U\). The new non-zero locations are initialised from the corresponding residual entries.
Execute a fixed-point iteration on \(L\) and \(U\) according to
\[\begin{split} F(L, U)_{ij} = \begin{cases} \frac{1}{u_{jj}} \left( a_{ij} - \sum_{k=1}^{j-1} l_{ik}\, u_{kj} \right), & i > j, \\ a_{ij} - \sum_{k=1}^{i-1} l_{ik}\, u_{kj}, & i \leq j. \end{cases} \end{split}\]For a more detailed description of the fixed-point iteration, see ParIlu.
Remove the smallest entries (by magnitude) from \(L\) and \(U\).
Execute a fixed-point iteration on the (now sparser) \(L\) and \(U\).
This ParILUT algorithm thus improves the sparsity pattern and the approximation of \(L\) and \(U\) simultaneously.
- References
Anzt, H., Chow, E., Dongarra, J. ParILUT — A Parallel Threshold ILU for GPUs. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 231–241. https://doi.org/10.1109/IPDPS.2019.00033
- Template Parameters:
ValueType – Type of the values of all matrices used in this class
IndexType – Type of the indices of all matrices used in this class
Public Static Functions
- static parameters_type parse(
- const config::pnode &config,
- const config::registry &context,
- const config::type_descriptor &td_for_child = config::make_type_descriptor<ValueType, IndexType>(),
Create the parameters from the property_tree. Because this is directly tied to the specific type, the value/index type settings within config are ignored and type_descriptor is only used for children configs.
- Parameters:
config – the property tree for setting
context – the registry
td_for_child – the type descriptor for children configs. The default uses the value/index type of this class.
- Returns:
parameters
-
struct parameters_type#
Public Members
-
size_type iterations#
The number of total iterations of ParILUT that will be executed. The default value is 5.
-
bool skip_sorting#
truemeans it is known that the matrix given to this factory will be sorted first by row, then by column index,falsemeans it is unknown or not sorted, so an additional sorting step will be performed during the factorization (it will not change the matrix given). The matrix must be sorted for this factorization to work.The
system_matrix, which will be given to this factory, must be sorted (first by row, then by column) in order for the algorithm to work. If it is known that the matrix will be sorted, this parameter can be set totrueto skip the sorting (therefore, shortening the runtime). However, if it is unknown or if the matrix is known to be not sorted, it must remainfalse, otherwise, the factorization might be incorrect.
-
bool approximate_select#
truemeans the candidate selection will use an inexact selection algorithm.falsemeans an exact selection algorithm will be used.Using the approximate selection algorithm can give a significant speed-up, but may in the worst case cause the algorithm to vastly exceed its
fill_in_limit. The exact selection needs more time, but more closely fulfills thefill_in_limitexcept for pathological cases (many candidates with equal magnitude).The default behavior is to use approximate selection.
-
bool deterministic_sample#
truemeans the sample used for the selection algorithm will be chosen deterministically. This is only relevant when usingapproximate_select. It is mostly used for testing.The selection algorithm used for
approximate_selectuses a small sample of the input data to determine an approximate threshold. The choice of elements can either be randomized, i.e., we may use different elements during each execution, or deterministic, i.e., the element choices are always the same.Note that even though the threshold selection step may be made deterministic this way, the calculation of the ILU factors can still be non-deterministic due to its asynchronous iterations.
The default behavior is to use a random sample.
-
double fill_in_limit#
the amount of fill-in that is allowed in L and U compared to the ILU(0) factorization.
The threshold for removing candidates from the intermediate L and U is set such that the resulting sparsity pattern has at most
fill_in_limittimes the number of non-zeros of the ILU(0) factorization. This selection is executed separately for both factors L and U.The default value
2.0allows twice the number of non-zeros in L and U compared to ILU(0).
-
std::shared_ptr<typename matrix_type::strategy_type> l_strategy#
Strategy which will be used by the L matrix. The default value
nullptrwill result in the strategyclassical.
-
std::shared_ptr<typename matrix_type::strategy_type> u_strategy#
Strategy which will be used by the U matrix. The default value
nullptrwill result in the strategyclassical.
-
size_type iterations#