gko::factorization::ParIlu#
Parallel ILU(0). Computes the incomplete LU factors \(L\) and \(U\) by iterating an asynchronous fixed-point update — the algorithm of Chow & Patel — and converges to a factorisation that matches \(A\) on its sparsity pattern. Designed for fine-grained parallelism on GPUs and many-core CPUs.
-
template<typename ValueType = default_precision, typename IndexType = int32>
class ParIlu # Inherits from
public gko::Composition<default_precision>
ParILU is an incomplete LU factorization which is computed in parallel.
\(L\) is a lower unitriangular, while \(U\) is an upper triangular matrix, which approximate a given matrix \(A\) with \(A \approx LU\). Here, \(L\) and \(U\) have the same sparsity pattern as \(A\), which is also called ILU(0).
The ParILU algorithm generates the incomplete factors iteratively, using a fixed-point iteration of the form
\[\begin{split} F(L, U)_{ij} = \begin{cases} \frac{1}{u_{jj}} \left( a_{ij} - \sum_{k=1}^{j-1} l_{ik}\, u_{kj} \right), & i > j, \\ a_{ij} - \sum_{k=1}^{i-1} l_{ik}\, u_{kj}, & i \leq j. \end{cases} \end{split}\]In general, the entries of \(L\) and \(U\) can be iterated in parallel and in asynchronous fashion; the algorithm asymptotically converges to incomplete factors \(L\) and \(U\) fulfilling \( (R = A - L U)\vert_\mathcal{S} = 0\vert_\mathcal{S} \) where \(\mathcal{S}\) is the pre-defined sparsity pattern (in case of ILU(0), the sparsity pattern of the system matrix \(A\)). The number of ParILU sweeps needed for convergence depends on the parallelism level: for sequential execution, a single sweep is sufficient; for fine-grained parallelism, the number of sweeps necessary to get a good approximation of the incomplete factors depends heavily on the problem. On the OpenMP executor, 3 sweeps usually give a decent approximation in our experiments, while GPU executors can take 10 or more iterations.
- References
Chow, E., Patel, A. Fine-Grained Parallel Incomplete LU Factorization. SIAM Journal on Scientific Computing, 37 (2), C169–C193, 2015. https://doi.org/10.1137/140968896
- Template Parameters:
ValueType – Type of the values of all matrices used in this class
IndexType – Type of the indices of all matrices used in this class
Public Static Functions
- static parameters_type parse(
- const config::pnode &config,
- const config::registry &context,
- const config::type_descriptor &td_for_child = config::make_type_descriptor<ValueType, IndexType>(),
Create the parameters from the property_tree. Because this is directly tied to the specific type, the value/index type settings within config are ignored and type_descriptor is only used for children configs.
- Parameters:
config – the property tree for setting
context – the registry
td_for_child – the type descriptor for children configs. The default uses the value/index type of this class.
- Returns:
parameters
-
struct parameters_type#
Public Members
-
size_type iterations#
The number of iterations the
computekernel will use when doing the factorization. The default value0meansAuto, so the implementation decides on the actual value depending on the resources that are available.
-
bool skip_sorting#
The
system_matrix, which will be given to this factory, must be sorted (first by row, then by column) in order for the algorithm to work. If it is known that the matrix will be sorted, this parameter can be set totrueto skip the sorting (therefore, shortening the runtime). However, if it is unknown or if the matrix is known to be not sorted, it must remainfalse, otherwise, the factorization might be incorrect.
-
std::shared_ptr<typename matrix_type::strategy_type> l_strategy#
Strategy which will be used by the L matrix. The default value
nullptrwill result in the strategyclassical.
-
std::shared_ptr<typename matrix_type::strategy_type> u_strategy#
Strategy which will be used by the U matrix. The default value
nullptrwill result in the strategyclassical.
-
size_type iterations#