gko::preconditioner::Jacobi#
Block-Jacobi preconditioner. Builds a block-diagonal approximation of
the system matrix and applies its explicit inverse on apply. When
max_block_size is 1 it degenerates to pointwise diagonal scaling; for
larger blocks the inverse is computed by Gauss-Jordan elimination with
column pivoting and stored in a custom interleaved layout. An adaptive
variant stores some blocks in lower precision to reduce bandwidth
during apply.
-
template<typename ValueType = default_precision, typename IndexType = int32>
class Jacobi # Inherits from
public gko::EnableLinOp<Jacobi<default_precision, int32>>
public ConvertibleTo<matrix::Dense<default_precision>>
public gko::WritableToMatrixData<default_precision, int32>
public gko::Transposable
A block-Jacobi preconditioner is a block-diagonal linear operator, obtained by inverting the diagonal blocks of the source operator.
The Jacobi class implements the inversion of the diagonal blocks using Gauss-Jordan elimination with column pivoting, and stores the inverse explicitly in a customized format.
If the diagonal blocks of the matrix are not explicitly set by the user, the implementation will try to automatically detect the blocks by first finding the natural blocks of the matrix, and then applying the supervariable agglomeration procedure on them. However, if problem-specific knowledge regarding the block diagonal structure is available, it is usually beneficial to explicitly pass the starting rows of the diagonal blocks, as the block detection is merely a heuristic and cannot perfectly detect the diagonal block structure. The current implementation supports blocks of up to 32 rows / columns.
The implementation also includes an improved, adaptive version of the block-Jacobi preconditioner, which can store some of the blocks in lower precision and thus improve the performance of preconditioner application by reducing the amount of memory transfers. This variant can be enabled by setting the Jacobi::Factory’s
storage_optimizationparameter. Refer to the documentation of the parameter for more details.Note
The current implementation supports blocks of up to 32 rows / columns.
Note
When using the adaptive variant, there may be a trade-off in terms of slightly longer preconditioner generation due to extra work required to detect the optimal precision of the blocks.
Note
When the max_block_size is set to 1, specialized kernels are used, both for generation (inverting the diagonals) and application (diagonal scaling) to reduce the overhead involved in the usual (adaptive) block case.
- Template Parameters:
ValueType – precision of matrix elements
IndexType – integral type used to store pointers to the start of each block
Public Functions
-
inline size_type get_num_blocks() const noexcept#
Returns the number of blocks of the operator.
- Returns:
the number of blocks of the operator
- inline const block_interleaved_storage_scheme<index_type> &get_storage_scheme(
Returns the storage scheme used for storing Jacobi blocks.
- Returns:
the storage scheme used for storing Jacobi blocks
-
inline const value_type *get_blocks() const noexcept#
Returns the pointer to the memory used for storing the block data.
Element (
i,j) of blockbis stored in position(get_block_pointers()[b] + i) * stride + jof the array.- Returns:
the pointer to the memory used for storing the block data
- inline const remove_complex<value_type> *get_conditioning(
Returns an array of 1-norm condition numbers of the blocks.
Note
This value is valid only if adaptive precision variant is used, and implementations of the standard non-adaptive variant are allowed to omit the calculation of condition numbers.
- Returns:
an array of 1-norm condition numbers of the blocks
-
inline size_type get_num_stored_elements() const noexcept#
Returns the number of elements explicitly stored in the matrix.
- Returns:
the number of elements explicitly stored in the matrix
-
virtual std::unique_ptr<LinOp> transpose() const override#
Returns a LinOp representing the transpose of the Transposable object.
- Returns:
a pointer to the new transposed object
-
virtual std::unique_ptr<LinOp> conj_transpose() const override#
Returns a LinOp representing the conjugate transpose of the Transposable object.
- Returns:
a pointer to the new conjugate transposed object
-
Jacobi &operator=(const Jacobi &other)#
Copy-assigns a Jacobi preconditioner. Preserves executor, copies all data and parameters.
-
Jacobi &operator=(Jacobi &&other)#
Move-assigns a Jacobi preconditioner. Preserves executor, moves all data and parameters. The moved-from object will be empty (0x0 and default parameters).
Public Static Functions
- static parameters_type parse(
- const config::pnode &config,
- const config::registry &context,
- const config::type_descriptor &td_for_child = config::make_type_descriptor<ValueType, IndexType>(),
Create the parameters from the property_tree. Because this is directly tied to the specific type, the value/index type settings within config are ignored and type_descriptor is only used for children configs.
Note
Jacobi does not support block_pointers and storage_optimization array.
- Parameters:
config – the property tree for setting
context – the registry
td_for_child – the type descriptor for children configs. The default uses the value/index type of this class.
- Returns:
parameters
-
struct parameters_type#
Public Members
-
uint32 max_block_size#
Maximal size of diagonal blocks.
Note
This value has to be between 1 and 32 (NVIDIA)/64 (AMD). For efficiency, when the max_block_size is set to 1, specialized kernels are used and the additional objects (block_ptrs etc) are set to null values.
-
uint32 max_block_stride#
Stride between two columns of a block (as number of elements).
Should be a multiple of cache line size for best performance.
Note
If this value is 0, it uses 64 in hip AMD but 32 in NVIDIA or reference executor. The allowed value: 0, 64 for AMD and 0, 32 for NVIDIA
-
bool skip_sorting#
truemeans it is known that the matrix given to this factory will be sorted first by row, then by column index,falsemeans it is unknown or not sorted, so an additional sorting step will be performed during the preconditioner generation (it will not change the matrix given). The matrix must be sorted for this preconditioner to work.The
system_matrix, which will be given to this factory, must be sorted (first by row, then by column) in order for the algorithm to work. If it is known that the matrix will be sorted, this parameter can be set totrueto skip the sorting (therefore, shortening the runtime). However, if it is unknown or if the matrix is known to be not sorted, it must remainfalse, otherwise, this preconditioner might be incorrect.
-
gko::array<index_type> block_pointers#
Starting (row / column) indexes of individual blocks.
An index past the last block has to be supplied as the last value. I.e. the size of the array has to be the number of blocks plus 1, where the first value is 0, and the last value is the number of rows / columns of the matrix.
Note
Even if not set explicitly, this parameter will be set to automatically detected values once the preconditioner is generated.
Note
If the parameter is set automatically, the size of the array does not correlate to the number of blocks, and is implementation defined. To obtain the number of blocks
nuse Jacobi::get_num_blocks(). The starting indexes of the blocks are stored in the firstn+1values of this array.Note
If the block-diagonal structure can be determined from the problem characteristics, it may be beneficial to pass this information specifically via this parameter, as the autodetection procedure is only a rough approximation of the true block structure.
Note
The maximum block size set by the max_block_size parameter has to be respected when setting this parameter. Failure to do so will lead to undefined behavior.
-
bool aggregate_l1#
Use L1 Jacobi, introduced in Baker et al., Multigrid smoothers for ultraparallel computing. The smoother applies whenever the matrix satisfies the row-dominance condition
\[ A_{ii} \geq \theta \sum_{j \in \text{off-diagonal block}} |A_{ij}|, \qquad \theta \geq 0. \]If set, the preconditioner is generated on \( A + \mathrm{diag}\!\left( \sum_{k \in \text{off-diagonal block of } i} |A_{ik}| \right) \) instead of \(A\) — i.e. the absolute values of off-diagonal-block entries on each row are aggregated into the diagonal entry before block inversion.Reference: Baker, A. H., Falgout, R. D., Kolev, T. V., Yang, U. M. Multigrid Smoothers for Ultraparallel Computing. SIAM Journal on Scientific Computing, 33 (5), 2864–2887, 2011. https://doi.org/10.1137/100798806
-
storage_optimization_type storage_optimization#
The precisions to use for the blocks of the matrix.
This parameter can either be a single instance of precision_reduction or an array of precision_reduction values. If set to
precision_reduction(0, 0)(this is the default), a regular full-precision block-Jacobi will be used. Any other value (or an array of values) will map to the adaptive variant.The best starting point when evaluating the potential of the adaptive version is to set this parameter to
precision_reduction::autodetect(). This option will cause the preconditioner to reduce the memory transfer volume as much as possible, while trying to maintain the quality of the preconditioner similar to that of the full precision block-Jacobi.For finer control, specific instances of precision_reduction can be used. Supported values are
precision_reduction(0, 0),precision_reduction(0, 1)andprecision_reduction(0, 2). Any other value will have the same effect asprecision_reduction(0, 0).If the ValueType template parameter is set to
double(or the complex variantstd::complex<double>),precision_reduction(0, 0)will use IEEE double precision for preconditioner storage,precision_reduction(0, 1)will use IEEE single precision, andprecision_reduction(0, 2)will use IEEE half precision.It ValueType is set to
float(orstd::complex<float>),precision_reduction(0, 0)will use IEEE single precision for preconditioner storage, and bothprecision_reduction(0, 1)andprecision_reduction(0, 2)will use IEEE half precision.Instead of specifying the same precision for all blocks, the precision of the elements can be specified on per-block basis by passing an array of precision_reduction objects. All values discussed above are supported, with the same meaning. It is worth mentioning that a value of
precision_reduction::autodetect()will cause autodetection on the per-block basis, so blocks whose precisions are autodetected can end up having different precisions once the preconditioner is generated. The detected precision generally depends on the conditioning of the block.If the number of diagonal blocks is larger than the number of elements in the passed array, the entire array will be replicated until enough values are available. For example, if the original array contained two precisions
(x, y)and the preconditioner contains 5 blocks, the array will be transformed into(x, y, x, y, x)before generating the preconditioner. As a consequence, specifying a single value for this property is exactly equivalent to specifying an array with a single element set to that value.Once an instance of the Jacobi linear operator is generated, the precisions used for the blocks can be obtained by reading this property. Whether the parameter was set to a single value or to an array of values can be queried by reading the
storage_optimization.is_block_wiseboolean sub-property. If it is set tofalse, the precision used for all blocks can be obtained usingstorage_optimization.of_all_blocksor by castingstorage_optimizationtoprecision_reduction. Independently of the value ofstorage_optimization.is_block_wise, thestorage_optimization.block_wiseproperty will return an array of precisions used for each block. All values set toprecision_reduction::autodetect()will be replaced with the value representing the precision used for the corresponding block. If the non-adaptive version of Jacobi is used, thestorage_optimization.block_wisearray will be empty.
-
remove_complex<value_type> accuracy#
The relative accuracy of the adaptive Jacobi variant.
This parameter is only used if the adaptive version of the algorithm is selected (see storage_optimization parameter for more details). The parameter is used when detecting the optimal precisions of blocks whose precision has been set to precision_reduction::autodetect().
The parameter represents the number of correct digits in the result of Jacobi::apply() operation of the adaptive variant, compared to the non-adaptive variant. In other words, the total preconditioning error will be:
|| inv(A)x - inv(M)x|| / || inv(A)x || <= c * (dropout + accuracy)
where
cis some constant depending on the problem size and roundoff error anddropoutthe error introduced by disregarding off-diagonal elements.Larger values reduce the volume of memory transfer, but increase the error compared to using full precision storage. Thus, tuning the accuracy to a value as close as possible to
dropoutwill result in optimal memory savings, while not degrading the quality of solution.
-
uint32 max_block_size#