Mixed and multi-precision solvers and preconditioners#

Ginkgo lets you pick the precision of the matrix, the solver, and the preconditioner independently, and combine them in a single solve. This page is a task recipe: it shows what to build and what happens at each boundary when the three precisions disagree. For the why — the performance argument and the numerics — see Mixed-precision design.

Two terms are used loosely in the literature; this page keeps them apart:

Mixed precision — two precisions cooperate in one solve, e.g. a double matrix driven by a float solver.
Multi precision — three or more precisions in the same solve, e.g. a double matrix, a float Krylov iteration, and a half preconditioner. This is the scenario the worked example below builds.

Tip

A runnable bundle of this page’s example — main.cpp, a standalone CMakeLists.txt, and a README.md with build / run instructions — is available as a single archive: mixed-multi-precision-solvers.zip. It solves one double system with all three precision combinations and prints the iteration count and true residual for each.

The mental model: each `LinOp` owns its own precision#

There is no single “solve precision” in Ginkgo. Every object — the matrix, the solver, and the preconditioner — is templated on its own ValueType, and each keeps its own value type for its whole lifetime:

A solver does not convert the matrix it is generated from. The matrix is stored by base pointer and keeps its original precision.
The preconditioner factory is generated from the original matrix, and converts that matrix down to its own precision internally.
Precision mismatches are reconciled at the boundaries — at every apply() and at generate() — by inserting an implicit conversion of the vectors (or, with GINKGO_MIXED_PRECISION, by dispatching a native cross-precision kernel; see below).

So the three precisions are three independent knobs. Nothing requires them to agree, and the construction below compiles and runs as written.

The worked example: three precisions in one CG solve#

The system matrix is double, the CG iteration runs in float, and the Jacobi preconditioner stores and applies in half:

#include <ginkgo/ginkgo.hpp>

auto exec = gko::ReferenceExecutor::create();  // or Cuda/Hip/Dpcpp/Omp

// (1) Matrix in double precision.
auto A = gko::share(gko::matrix::Csr<double, int>::create(exec));
// ... read A from a file or matrix_data ...

// (2) Preconditioner factory in half precision.
auto precond = gko::preconditioner::Jacobi<gko::half, int>::build()
                   .with_max_block_size(1u)        // scalar Jacobi
                   .on(exec);

// (3) Solver in float precision, using the half preconditioner,
//     generated from the double matrix.
auto solver = gko::solver::Cg<float>::build()
                  .with_preconditioner(precond)
                  .with_criteria(
                      gko::stop::Iteration::build().with_max_iters(1000u),
                      gko::stop::ResidualNorm<float>::build()
                          .with_reduction_factor(1e-6f))
                  .on(exec)
                  ->generate(A);                   // A stays double

solver->apply(b, x);   // b, x can be Dense<double>, Dense<float>, ...

What happens at generate time#

Step	Precision behaviour
`Cg<float>::generate(A)`	The `Csr<double>` is stored unchanged. Only its dimensions are checked; precision is irrelevant. The solver never holds a `float` copy of the matrix.
Preconditioner generation	The factory receives the original `double` matrix and runs `Jacobi<half>::generate(A)`, which converts `A` down to `half` to build the inverse diagonal/blocks. The `half` data lives inside the preconditioner.

What happens at apply time#

Cg<float> runs its entire Krylov iteration in float: all workspace vectors (r, z, p, q, the scalars) are Dense<float>. Three independent conversion boundaries surround that float core:

Boundary	Call	What is converted
Outer `b` / `x`	`solver->apply(b, x)`	`b` and `x` are converted to `Dense<float>` on entry; the result is converted back to `x`’s type on exit.
Matrix `A·p`	`A->apply(p, q)` inside CG	`Csr<double>` applied to `float` vectors. Without `GINKGO_MIXED_PRECISION` the vectors are converted up to `double`, the SpMV runs in `double`, and the result is converted back to `float` — every iteration. With the option on, a native `double`-matrix/`float`-vector kernel runs and no temporaries are allocated.
Preconditioner `M⁻¹·r`	`precond->apply(r, z)` inside CG	`Jacobi<half>` applied to `float` vectors: `r` is converted to `half`, the preconditioner applies in `half`, and `z` is converted back to `float` — every iteration, regardless of the build option.

The data flow, end to end:

 b, x  (double)
   │  convert in / convert x back out
   ▼
 CG iteration runs in float
   │  A->apply (double matrix)        │  M->apply (half preconditioner)
   ▼                                  ▼
 float vecs → double (or native)   float vecs → half temporaries
 SpMV in double                    Jacobi applied in half

The takeaway: the solver’s precision (float) sets the arithmetic of the iteration and therefore the achievable convergence; the matrix’s double and the preconditioner’s half are only ever seen through their boundary conversions, so each component’s accuracy floor is its own value type — a half preconditioner can never be more accurate than half, even inside a float solve.

What the example shows#

Running the downloadable sample on a well-conditioned 1D-Poisson system (n = 100) produces output of this shape:

(1) Cg<double> + Jacobi<double>         iters = 50     ||b - A x|| = 5.188e-14
(2) Cg<float>  + Jacobi<double>         iters = 51     ||b - A x|| = 2.417e-06
(3) Cg<float>  + Jacobi<half>           iters = 200    ||b - A x|| = 3.720e-06

(1) reaches double-level accuracy — every component is double.
(2) plateaus near the float solver’s accuracy floor (~1e-6), even though the matrix and preconditioner are double: the Krylov iteration runs in float, so that caps the achievable accuracy.
(3) does not converge — it runs to the 200-iteration cap. The half preconditioner rounds the residual to roughly three digits on every apply, which stops CG from driving the residual below the float tolerance, even on this well-conditioned system. Lowering a component’s precision can stall — or, on harder problems, diverge — a solve that worked at higher precision. This is exactly the failure the warning below is about.

Where the conversion cost goes — and how to remove it#

By default (GINKGO_MIXED_PRECISION=OFF) every boundary above takes the conversion path: a temporary vector at the operator’s precision is allocated, the apply runs, and the result is converted back. This is always correct but pays per-call allocation and two casts.

Building Ginkgo with -DGINKGO_MIXED_PRECISION=ON instantiates native cross-precision kernels for Csr, Ell, and SparsityCsr, so the matrix-apply boundary above dispatches directly on the runtime (matrix, vector) value types with no temporary. The trade-off is longer build times and a larger binary. The preconditioner boundary still converts, because Jacobi (like most preconditioners) dispatches on its own value type rather than a cross-precision pair. See Native vs conversion-based dispatch for the full table of which formats opt in.

Two precisions arranged as an algorithm#

The example above mixes precisions passively — through boundary conversions. Ginkgo also offers two patterns where mixed precision is part of the algorithm:

Iterative refinement (solver::Ir) — wrap a low-precision inner solver in a high-precision outer correction loop. The bulk of the work runs in float; the outer residual and accumulation stay in double. This is the classic, numerically principled route to mixed precision.
Adaptive-precision block Jacobi — preconditioner::Jacobi with with_storage_optimization(...) stores each diagonal block at the lowest precision its conditioning allows, while the apply still decompresses to the outer arithmetic precision.

Both are covered with runnable snippets in Mixed-precision design.

A note on compressed storage inside a solver#

A different flavour of mixed precision keeps the arithmetic fixed but stores an intermediate buffer in a narrower type. solver::CbGmres (compressed-basis GMRES) does exactly this for its Krylov basis, selected with one parameter:

auto solver = gko::solver::CbGmres<double>::build()
                  .with_storage_precision(
                      gko::solver::cb_gmres::storage_precision::reduce1)  // store float
                  .with_criteria(/* ... */)
                  .on(exec)
                  ->generate(A);

Under the hood this rides on Ginkgo’s accessors, which decouple the stored type from the arithmetic type. You do not need to touch accessors to use CB-GMRES — the parameter is the whole interface. If you want to build your own storage-compressed kernels, that is its own topic: Build a mixed-precision kernel with accessors.

Supported precisions#

The value types you can combine are double, float, gko::float16 (half), and gko::bfloat16, plus their std::complex<…> forms. Any pair you mix must lie within this set — conversions are only generated for these Dense instantiations, and a type outside it fails at the dispatch with GKO_NOT_SUPPORTED. Half and bfloat16 are additionally gated at build time and per backend; see Supported value types.

Warning

Mixing precisions can make an otherwise convergent solve unstable or divergent. Whether a given matrix/solver/preconditioner precision combination converges depends entirely on the properties of your linear system — its conditioning, scaling, and spectrum. A combination that works on one problem may stagnate, lose orthogonality, or diverge on another. Ginkgo does not — and cannot — verify that a chosen precision mix is numerically sound for your system. It is your responsibility to validate convergence and accuracy for your own problems, ideally by benchmarking each candidate combination against a full-precision reference run before relying on it.