Add a new iterative solver#
Ginkgo iterative solvers follow a strict CRTP skeleton: a class
template Solver<ValueType> that inherits from EnableLinOp<Solver>,
plumbs a workspace through workspace_traits, and runs its iteration
loop in apply_dense_impl. This page walks through the pattern using
gko::solver::Cg (include/ginkgo/core/solver/cg.hpp,
core/solver/cg.cpp) as the worked example.
What you commit to implement#
File |
What it holds |
|---|---|
|
Class template, |
|
|
|
Per-step kernel declarations (e.g. CG has |
One backend implementation per kernel |
|
|
Add an enum entry to |
|
Add |
|
Add the |
|
Add |
Kernel files follow the same rules as in Add a new kernel.
The class skeleton#
template <typename ValueType = default_precision>
class MySolver
: public EnableLinOp<MySolver<ValueType>>,
public EnablePreconditionedIterativeSolver<ValueType, MySolver<ValueType>>,
public Transposable {
friend class EnableLinOp<MySolver>;
friend class EnablePolymorphicObject<MySolver, LinOp>;
GKO_ASSERT_SUPPORTED_VALUE_TYPE;
public:
using value_type = ValueType;
using transposed_type = MySolver<ValueType>; // self if symmetric
std::unique_ptr<LinOp> transpose() const override;
std::unique_ptr<LinOp> conj_transpose() const override;
bool apply_uses_initial_guess() const override { return true; }
class Factory;
struct parameters_type
: enable_preconditioned_iterative_solver_factory_parameters<
parameters_type, Factory> {
// Add extra solver-specific parameters here, e.g.:
// int GKO_FACTORY_PARAMETER_SCALAR(krylov_dim, 30);
};
GKO_ENABLE_LIN_OP_FACTORY(MySolver, parameters, Factory);
GKO_ENABLE_BUILD_METHOD(Factory);
static parameters_type parse(
const config::pnode& config,
const config::registry& context,
const config::type_descriptor& td_for_child =
config::make_type_descriptor<ValueType>());
protected:
void apply_impl(const LinOp* b, LinOp* x) const override;
template <typename VectorType>
void apply_dense_impl(const VectorType* b, VectorType* x) const;
void apply_impl(const LinOp* alpha, const LinOp* b,
const LinOp* beta, LinOp* x) const override;
explicit MySolver(std::shared_ptr<const Executor> exec)
: EnableLinOp<MySolver>(std::move(exec)) {}
explicit MySolver(const Factory* factory,
std::shared_ptr<const LinOp> system_matrix)
: EnableLinOp<MySolver>(factory->get_executor(),
gko::transpose(system_matrix->get_size())),
EnablePreconditionedIterativeSolver<ValueType, MySolver>{
std::move(system_matrix), factory->get_parameters()},
parameters_{factory->get_parameters()} {}
};
Key pieces:
EnableLinOp<MySolver>injects the publicapplyoverloads — you only implementapply_impl.EnablePreconditionedIterativeSolveris a mixin that gives every preconditioned iterative solverwith_criteria(...),with_preconditioner(...),with_generated_preconditioner(...), plus the corresponding getters. Subclassing it pulls the factory parameter trait in viaenable_preconditioned_iterative_solver_factory_parameters.apply_uses_initial_guess() { return true; }tells the framework thatxon entry toapplyis the initial guess and must not be zeroed.GKO_ENABLE_LIN_OP_FACTORYandGKO_ENABLE_BUILD_METHODtogether define theFactoryclass,parameters_, thebuild()method, and thewith_*setters. They’re macro-heavy boilerplate; theparameters_typestruct above is the one place you decide what new knobs the solver exposes.
workspace_traits — the typed scratch pool#
Iterative solvers allocate the same temporary vectors and scalars every
iteration. To avoid reallocating on every apply, the base
SolverBaseLinOp holds a mutable workspace that caches them. The
solver tells the workspace how many slots it needs, and what each slot
is for, through a workspace_traits<MySolver<V>> specialisation:
template <typename ValueType>
struct workspace_traits<MySolver<ValueType>> {
using Solver = MySolver<ValueType>;
static int num_vectors(const Solver&);
static int num_arrays(const Solver&);
static std::vector<std::string> op_names(const Solver&);
static std::vector<std::string> array_names(const Solver&);
static std::vector<int> scalars(const Solver&);
static std::vector<int> vectors(const Solver&);
// Slot IDs for operators (Dense* vectors / scalars):
constexpr static int r = 0; // residual
constexpr static int z = 1; // preconditioned residual
constexpr static int p = 2;
constexpr static int q = 3; // A*p
constexpr static int beta = 4;
constexpr static int prev_rho = 5;
constexpr static int rho = 6;
constexpr static int one = 7; // constant 1.0
constexpr static int minus_one = 8; // constant -1.0
// Slot IDs for typed arrays:
constexpr static int stop = 0; // stopping_status array
constexpr static int tmp = 1; // reduction scratch (char)
};
The trait method bodies in .cpp just return constants and string
lists; e.g. CG returns 9 vectors and 2 arrays. The names appear in
log output, so make them descriptive.
scalars() returns the IDs of the slots that are 1×nrhs (independent
of problem size), vectors() returns the IDs of the slots that are
n×nrhs (allocated per problem). The base class uses this to decide
when to reallocate and when to reuse.
apply_impl and the precision dispatch#
The two apply_impl overrides dispatch precision and then delegate to
the templated apply_dense_impl. Use the distributed-aware variant —
it accepts both dist::Vector and matrix::Dense, so the same code
path works in serial and MPI builds:
template <typename ValueType>
void MySolver<ValueType>::apply_impl(const LinOp* b, LinOp* x) const
{
if (!this->get_system_matrix()) {
return;
}
experimental::precision_dispatch_real_complex_distributed<ValueType>(
[this](auto dense_b, auto dense_x) {
this->apply_dense_impl(dense_b, dense_x);
},
b, x);
}
template <typename ValueType>
void MySolver<ValueType>::apply_impl(const LinOp* alpha, const LinOp* b,
const LinOp* beta, LinOp* x) const
{
experimental::precision_dispatch_real_complex_distributed<ValueType>(
[this](auto dense_alpha, auto dense_b, auto dense_beta, auto dense_x) {
auto x_clone = dense_x->clone();
this->apply_dense_impl(dense_b, x_clone.get());
dense_x->scale(dense_beta);
dense_x->add_scaled(dense_alpha, x_clone);
},
alpha, b, beta, x);
}
If your solver is not distributed-ready, use
precision_dispatch_real_complex<ValueType> instead — it accepts only
matrix::Dense.
apply_dense_impl — the iteration loop#
This is the actual algorithm. The boilerplate macros from
core/solver/solver_boilerplate.hpp allocate the workspace slots into
named local variables of type Dense<V>*:
template <typename ValueType>
template <typename VectorType>
void MySolver<ValueType>::apply_dense_impl(const VectorType* dense_b,
VectorType* dense_x) const
{
using std::swap;
constexpr uint8 RelativeStoppingId{1};
auto exec = this->get_executor();
this->setup_workspace();
GKO_SOLVER_VECTOR(r, dense_b);
GKO_SOLVER_VECTOR(z, dense_b);
GKO_SOLVER_VECTOR(p, dense_b);
GKO_SOLVER_VECTOR(q, dense_b);
GKO_SOLVER_SCALAR(beta, dense_b);
GKO_SOLVER_SCALAR(prev_rho, dense_b);
GKO_SOLVER_SCALAR(rho, dense_b);
GKO_SOLVER_ONE_MINUS_ONE();
GKO_SOLVER_STOP_REDUCTION_ARRAYS();
bool one_changed{};
// r = b - A*x; rho = 0; prev_rho = 1; z, p, q = 0
exec->run(my_solver::make_initialize(
gko::detail::get_local(dense_b)->get_const_device_view(),
gko::detail::get_local(r)->get_device_view(),
/* ... */ stop_status));
this->get_system_matrix()->apply(neg_one_op, dense_x, one_op, r);
auto stop_criterion = this->get_stop_criterion_factory()->generate(
this->get_system_matrix(),
std::shared_ptr<const LinOp>(dense_b, [](const LinOp*) {}),
dense_x, r);
int iter = -1;
while (true) {
this->get_preconditioner()->apply(r, z);
r->compute_conj_dot(z, rho, reduction_tmp);
++iter;
bool all_stopped =
stop_criterion->update()
.num_iterations(iter)
.residual(r)
.implicit_sq_residual_norm(rho)
.solution(dense_x)
.check(RelativeStoppingId, true,
&stop_status, &one_changed);
this->template log<log::Logger::iteration_complete>(
this, dense_b, dense_x, iter, r, nullptr, rho,
&stop_status, all_stopped);
if (all_stopped) {
break;
}
// ... rest of the iteration steps as kernels ...
swap(prev_rho, rho);
}
}
Boilerplate macros (from core/solver/solver_boilerplate.hpp):
Macro |
Effect |
|---|---|
|
Allocates a |
|
Allocates a |
|
Creates the constants |
|
Allocates the |
gko::detail::get_local(v) extracts the rank-local matrix::Dense<V>*
from a vector pointer — works for both experimental::distributed::Vector
and ordinary matrix::Dense, so the same iteration body covers both
cases.
transpose, conj_transpose#
When the solver is symmetric (CG, MINRES, Chebyshev), transposed_type = MySolver<ValueType> and transpose() rebuilds the solver around the
transposed system matrix and transposed preconditioner. Copy the body
from Cg::transpose / Cg::conj_transpose in core/solver/cg.cpp
and adjust the operator factory. If your solver is not transposable,
do not include public Transposable in the base list.
Parse and register for JSON config#
To make the solver constructible from a pnode / JSON tree, three
edits are needed:
core/config/config_helper.hpp— appendMySolverto theLinOpFactoryTypeenum.core/config/registry.cpp— add an entry togenerate_config_map():{"solver::MySolver", parse<LinOpFactoryType::MySolver>},
core/config/solver_config.cpp— add the parse specialisation:GKO_PARSE_VALUE_TYPE(MySolver, gko::solver::MySolver);
This expands into the explicit
template <> parse<LinOpFactoryType::MySolver>specialisation. For a value-and-index-templated solver, useGKO_PARSE_VALUE_AND_INDEX_TYPE. TheDirectsolver and the triangular solvers are the existing precedents.
The corresponding MySolver::parse(...) body inside core/solver/my_solver.cpp
is one-liner — let common_solver_parse do the work:
template <typename ValueType>
typename MySolver<ValueType>::parameters_type MySolver<ValueType>::parse(
const config::pnode& config, const config::registry& context,
const config::type_descriptor& td_for_child)
{
auto params = MySolver<ValueType>::build();
config::config_check_decorator config_check(config);
config::common_solver_parse(params, config_check, context, td_for_child);
return params;
}
If your solver has parameters beyond the preconditioned-iterative-solver
defaults, parse those out of config_check between the
common_solver_parse call and the return.
Instantiation#
End the .cpp with the value-type-explicit instantiation block:
#define GKO_DECLARE_MY_SOLVER(ValueType) class MySolver<ValueType>
#define GKO_DECLARE_MY_SOLVER_TRAITS(ValueType) \
struct workspace_traits<MySolver<ValueType>>
GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(GKO_DECLARE_MY_SOLVER);
GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(GKO_DECLARE_MY_SOLVER_TRAITS);
Both lines are required — the second is what makes
workspace_traits<MySolver<double>>::num_vectors(...) actually link.
Reference-parity test#
The full test plan is covered in Write tests. For a new solver you write three files:
core/test/solver/my_solver.cpp— pure-C++ tests on the factory:parameters_typedefaults,with_*setters,build()accepting generated preconditioners,transpose()returning a valid object. These don’t need a system matrix or iteration. Registered withginkgo_create_test(my_solver)incore/test/solver/CMakeLists.txt.reference/test/solver/my_solver_kernels.cpp— Reference-only convergence and correctness tests on a small SPD or general system (depending on the solver class). Registered withginkgo_create_test(my_solver_kernels)inreference/test/solver/CMakeLists.txt.test/solver/my_solver_kernels.cpp— the cross-backend parity test. Same source file compiled once per enabled backend (omp,cuda,hip,dpcpp) byginkgo_create_common_test, usingCommonTestFixture(fromtest/utils/common_fixture.hpp) which provides bothref(Reference) andexec(the target backend). Each test sets up the same system on both, runs the solver, and asserts the solutions agree withGKO_ASSERT_MTX_NEAR(...). Registered withginkgo_create_common_test(my_solver_kernels)intest/solver/CMakeLists.txt.
cg_kernels.cpp in each of those three trees is the existing template
to copy from.
See also
Add a new kernel — for the iteration-step kernels.
Add a new preconditioner — the simpler no-workspace variant of this pattern.
Configure via JSON — what the registry edits above unlock for the end user.