`gko::preconditioner::Jacobi`#

Block-Jacobi preconditioner. Builds a block-diagonal approximation of the system matrix and applies its explicit inverse on apply. When max_block_size is 1 it degenerates to pointwise diagonal scaling; for larger blocks the inverse is computed by Gauss-Jordan elimination with column pivoting and stored in a custom interleaved layout. An adaptive variant stores some blocks in lower precision to reduce bandwidth during apply.

template<typename ValueType = default_precision, typename IndexType = int32> class Jacobi #

Inherits from

public gko::EnableLinOp<Jacobi<default_precision, int32>>
public ConvertibleTo<matrix::Dense<default_precision>>
public gko::WritableToMatrixData<default_precision, int32>
public gko::Transposable

A block-Jacobi preconditioner is a block-diagonal linear operator, obtained by inverting the diagonal blocks of the source operator.

The Jacobi class implements the inversion of the diagonal blocks using Gauss-Jordan elimination with column pivoting, and stores the inverse explicitly in a customized format.

If the diagonal blocks of the matrix are not explicitly set by the user, the implementation will try to automatically detect the blocks by first finding the natural blocks of the matrix, and then applying the supervariable agglomeration procedure on them. However, if problem-specific knowledge regarding the block diagonal structure is available, it is usually beneficial to explicitly pass the starting rows of the diagonal blocks, as the block detection is merely a heuristic and cannot perfectly detect the diagonal block structure. The current implementation supports blocks of up to 32 rows / columns.

The implementation also includes an improved, adaptive version of the block-Jacobi preconditioner, which can store some of the blocks in lower precision and thus improve the performance of preconditioner application by reducing the amount of memory transfers. This variant can be enabled by setting the Jacobi::Factory’s storage_optimization parameter. Refer to the documentation of the parameter for more details.

Note

The current implementation supports blocks of up to 32 rows / columns.

Note

When using the adaptive variant, there may be a trade-off in terms of slightly longer preconditioner generation due to extra work required to detect the optimal precision of the blocks.

Note

When the max_block_size is set to 1, specialized kernels are used, both for generation (inverting the diagonals) and application (diagonal scaling) to reduce the overhead involved in the usual (adaptive) block case.

Template Parameters:

ValueType – precision of matrix elements
IndexType – integral type used to store pointers to the start of each block

Public Functions

inline size_type get_num_blocks() const noexcept#

Returns the number of blocks of the operator.

Returns:: the number of blocks of the operator

inline const block_interleaved_storage_scheme<index_type> &get_storage_scheme( ) const noexcept#

Returns the storage scheme used for storing Jacobi blocks.

Returns:: the storage scheme used for storing Jacobi blocks

inline const value_type *get_blocks() const noexcept#

Returns the pointer to the memory used for storing the block data.

Element (i, j) of block b is stored in position (get_block_pointers()[b] + i) * stride + j of the array.

Returns:: the pointer to the memory used for storing the block data

inline const remove_complex<value_type> *get_conditioning( ) const noexcept#

Returns an array of 1-norm condition numbers of the blocks.

Note

This value is valid only if adaptive precision variant is used, and implementations of the standard non-adaptive variant are allowed to omit the calculation of condition numbers.

Returns:: an array of 1-norm condition numbers of the blocks

inline size_type get_num_stored_elements() const noexcept#

Returns the number of elements explicitly stored in the matrix.

Returns:: the number of elements explicitly stored in the matrix

virtual std::unique_ptr<LinOp> transpose() const override#

Returns a LinOp representing the transpose of the Transposable object.

Returns:: a pointer to the new transposed object

virtual std::unique_ptr<LinOp> conj_transpose() const override#

Returns a LinOp representing the conjugate transpose of the Transposable object.

Returns:: a pointer to the new conjugate transposed object

Jacobi &operator=(const Jacobi &other)#: Copy-assigns a Jacobi preconditioner. Preserves executor, copies all data and parameters.

Jacobi &operator=(Jacobi &&other)#: Move-assigns a Jacobi preconditioner. Preserves executor, moves all data and parameters. The moved-from object will be empty (0x0 and default parameters).

Jacobi(const Jacobi &other)#: Copy-constructs a Jacobi preconditioner. Inherits executor, copies all data and parameters.

Jacobi(Jacobi &&other)#: Move-assigns a Jacobi preconditioner. Inherits executor, moves all data and parameters. The moved-from object will be empty (0x0 and default parameters).

Public Static Functions

static parameters_type parse( const config::pnode &config, const config::registry &context, const config::type_descriptor &td_for_child = config::make_type_descriptor<ValueType, IndexType>(), )#

Create the parameters from the property_tree. Because this is directly tied to the specific type, the value/index type settings within config are ignored and type_descriptor is only used for children configs.

Note

Jacobi does not support block_pointers and storage_optimization array.

Parameters:

config – the property tree for setting
context – the registry
td_for_child – the type descriptor for children configs. The default uses the value/index type of this class.

Returns:

parameters

struct parameters_type#

Public Members

uint32 max_block_size#: Maximal size of diagonal blocks.

Note

This value has to be between 1 and 32 (NVIDIA)/64 (AMD). For efficiency, when the max_block_size is set to 1, specialized kernels are used and the additional objects (block_ptrs etc) are set to null values.

uint32 max_block_stride#

Stride between two columns of a block (as number of elements).

Should be a multiple of cache line size for best performance.

Note

If this value is 0, it uses 64 in hip AMD but 32 in NVIDIA or reference executor. The allowed value: 0, 64 for AMD and 0, 32 for NVIDIA

bool skip_sorting#

true means it is known that the matrix given to this factory will be sorted first by row, then by column index, false means it is unknown or not sorted, so an additional sorting step will be performed during the preconditioner generation (it will not change the matrix given). The matrix must be sorted for this preconditioner to work.

The system_matrix, which will be given to this factory, must be sorted (first by row, then by column) in order for the algorithm to work. If it is known that the matrix will be sorted, this parameter can be set to true to skip the sorting (therefore, shortening the runtime). However, if it is unknown or if the matrix is known to be not sorted, it must remain false, otherwise, this preconditioner might be incorrect.

gko::array<index_type> block_pointers#

Starting (row / column) indexes of individual blocks.

An index past the last block has to be supplied as the last value. I.e. the size of the array has to be the number of blocks plus 1, where the first value is 0, and the last value is the number of rows / columns of the matrix.

Note

Even if not set explicitly, this parameter will be set to automatically detected values once the preconditioner is generated.

Note

If the parameter is set automatically, the size of the array does not correlate to the number of blocks, and is implementation defined. To obtain the number of blocks n use Jacobi::get_num_blocks(). The starting indexes of the blocks are stored in the first n+1 values of this array.

Note

If the block-diagonal structure can be determined from the problem characteristics, it may be beneficial to pass this information specifically via this parameter, as the autodetection procedure is only a rough approximation of the true block structure.

Note

The maximum block size set by the max_block_size parameter has to be respected when setting this parameter. Failure to do so will lead to undefined behavior.

bool aggregate_l1#

Use L1 Jacobi, introduced in Baker et al., Multigrid smoothers for ultraparallel computing. The smoother applies whenever the matrix satisfies the row-dominance condition

\[ A_{ii} \geq \theta \sum_{j \in \text{off-diagonal block}} |A_{ij}|, \qquad \theta \geq 0. \]

If set, the preconditioner is generated on \( A + \mathrm{diag}\!\left( \sum_{k \in \text{off-diagonal block of } i} |A_{ik}| \right) \) instead of \(A\) — i.e. the absolute values of off-diagonal-block entries on each row are aggregated into the diagonal entry before block inversion.

Reference: Baker, A. H., Falgout, R. D., Kolev, T. V., Yang, U. M. Multigrid Smoothers for Ultraparallel Computing. SIAM Journal on Scientific Computing, 33 (5), 2864–2887, 2011. https://doi.org/10.1137/100798806

storage_optimization_type storage_optimization#

The precisions to use for the blocks of the matrix.

This parameter can either be a single instance of precision_reduction or an array of precision_reduction values. If set to precision_reduction(0, 0) (this is the default), a regular full-precision block-Jacobi will be used. Any other value (or an array of values) will map to the adaptive variant.

The best starting point when evaluating the potential of the adaptive version is to set this parameter to precision_reduction::autodetect(). This option will cause the preconditioner to reduce the memory transfer volume as much as possible, while trying to maintain the quality of the preconditioner similar to that of the full precision block-Jacobi.

For finer control, specific instances of precision_reduction can be used. Supported values are precision_reduction(0, 0), precision_reduction(0, 1) and precision_reduction(0, 2). Any other value will have the same effect as precision_reduction(0, 0).

If the ValueType template parameter is set to double (or the complex variant std::complex<double>), precision_reduction(0, 0) will use IEEE double precision for preconditioner storage, precision_reduction(0, 1) will use IEEE single precision, and precision_reduction(0, 2) will use IEEE half precision.

It ValueType is set to float (or std::complex<float>), precision_reduction(0, 0) will use IEEE single precision for preconditioner storage, and both precision_reduction(0, 1) and precision_reduction(0, 2) will use IEEE half precision.

Instead of specifying the same precision for all blocks, the precision of the elements can be specified on per-block basis by passing an array of precision_reduction objects. All values discussed above are supported, with the same meaning. It is worth mentioning that a value of precision_reduction::autodetect() will cause autodetection on the per-block basis, so blocks whose precisions are autodetected can end up having different precisions once the preconditioner is generated. The detected precision generally depends on the conditioning of the block.

If the number of diagonal blocks is larger than the number of elements in the passed array, the entire array will be replicated until enough values are available. For example, if the original array contained two precisions (x, y) and the preconditioner contains 5 blocks, the array will be transformed into (x, y, x, y, x) before generating the preconditioner. As a consequence, specifying a single value for this property is exactly equivalent to specifying an array with a single element set to that value.

Once an instance of the Jacobi linear operator is generated, the precisions used for the blocks can be obtained by reading this property. Whether the parameter was set to a single value or to an array of values can be queried by reading the storage_optimization.is_block_wise boolean sub-property. If it is set to false, the precision used for all blocks can be obtained using storage_optimization.of_all_blocks or by casting storage_optimization to precision_reduction. Independently of the value of storage_optimization.is_block_wise, the storage_optimization.block_wise property will return an array of precisions used for each block. All values set to precision_reduction::autodetect() will be replaced with the value representing the precision used for the corresponding block. If the non-adaptive version of Jacobi is used, the storage_optimization.block_wise array will be empty.

remove_complex<value_type> accuracy#

The relative accuracy of the adaptive Jacobi variant.

This parameter is only used if the adaptive version of the algorithm is selected (see storage_optimization parameter for more details). The parameter is used when detecting the optimal precisions of blocks whose precision has been set to precision_reduction::autodetect().

The parameter represents the number of correct digits in the result of Jacobi::apply() operation of the adaptive variant, compared to the non-adaptive variant. In other words, the total preconditioning error will be:

|| inv(A)x - inv(M)x|| / || inv(A)x || <= c * (dropout + accuracy)

where c is some constant depending on the problem size and roundoff error and dropout the error introduced by disregarding off-diagonal elements.

Larger values reduce the volume of memory transfer, but increase the error compared to using full precision storage. Thus, tuning the accuracy to a value as close as possible to dropout will result in optimal memory savings, while not degrading the quality of solution.

gko::preconditioner::Jacobi#

`gko::preconditioner::Jacobi`#