Logging and observability#

Ginkgo’s Logger interface is the primary observability hook — it lets you watch what’s happening inside a solver, profile kernels, or capture convergence data without touching solver code. This page covers the built-in loggers, where they hook in, and how to attach them.

The Logger interface#

A Logger is a callback receiver. The LinOp and Executor runtimes call into the logger at well-defined events:

LinOp lifecycle: on_linop_apply_started, on_linop_apply_completed, on_linop_advanced_apply_started, and their _completed counterparts.
Iterative solver: on_iteration_complete — fires once per iteration with iteration count, residual, and current solution.
Executor: kernel launches, memory allocations, copies, and synchronization points.

You attach a logger to a LinOp (typically a solver) before calling apply:

auto solver = factory->generate(matrix);
solver->add_logger(my_logger);
solver->apply(b, x);

Or attach it to an executor to observe every kernel and allocation on that device:

exec->add_logger(my_logger);

A single logger instance can be attached to multiple objects simultaneously. Loggers are stored as shared_ptr — the owning object does not transfer ownership.

Note

Logging is currently not MPI-aware. Each rank logs its own events independently. If you need cross-rank timing, coordinate outside of Ginkgo.

Built-in loggers#

Logger	What it captures	Use case
`Convergence`	Iteration count and final residual norm	After solve: did it converge well?
`Stream`	Logs every event to a stream (`cout`, `ofstream`)	Tracing and debugging
`Performance`	Operation timings and allocation totals	Lightweight profiling
`ProfilerHook`	NVTX / ROCTX / VTune named regions	GPU profiler integration
`BatchConvergence`	Per-batch-item iteration count and residual	Batched solvers

All built-in loggers live in the gko::log namespace.

Convergence — the most common use case#

Convergence captures the final state of an iterative solve: the total iteration count and the residual norm at termination. It is the right first logger to add when you want to know whether a solve succeeded.

auto convergence = std::make_shared<gko::log::Convergence<double>>();
solver->add_logger(convergence);
solver->apply(b, x);

std::cout << "Converged in " << convergence->get_num_iterations() << " iterations\n";
auto resnorm = convergence->get_residual_norm();   // Dense<double> on `exec`, shape 1×nrhs

Convergence is templated on the value type. The residual norm is a 1×nrhs Dense living on the solver’s executor.

Attention

get_residual_norm() returns a LinOp* to a Dense that lives on the solver’s executor. If the solver runs on a CudaExecutor, HipExecutor, or DpcppExecutor, the underlying get_const_values() pointer is a device pointer — dereferencing it on the host (e.g., reading data[0] directly to print a value) is undefined behaviour and typically segfaults.

To read the values on the host, clone the residual-norm vector to a host-resident executor first:

auto host = gko::ReferenceExecutor::create();
auto resnorm_host = gko::clone(host,
    gko::as<gko::matrix::Dense<double>>(convergence->get_residual_norm()));
std::cout << "norm = " << resnorm_host->at(0, 0) << "\n";

The same warning applies to get_implicit_sq_resnorm().

The logger reads the outcome from the stopping criteria that triggered termination, so what counts as “converged” is fully controlled by the criteria you configure on the solver.

Stream logger — tracing#

Stream logs every observable event to an std::ostream. It is the easiest way to see exactly what a solver is doing:

auto stream_logger = std::make_shared<gko::log::Stream<>>(std::cout, true);
solver->add_logger(stream_logger);
solver->apply(b, x);

The second constructor argument controls verbosity — set it to true to include matrix and vector values in the output, or false for event names only. Stream logging is most useful during the first integration of a new solver or when tracking down an unexpected convergence failure.

Attention

Stream logging is synchronous and can be noisy. Disable it in production paths — every event incurs a format-and-write call.

ProfilerHook — integrating with external profilers#

ProfilerHook emits named regions that profiling tools understand. The convenience factory create_for_executor picks the appropriate backend for the active executor:

Factory	Backend	Used by
`create_nvtx(color_argb = …)`	NVTX	NSight Systems on CUDA
`create_roctx()`	ROCTX	rocprof on HIP
`create_vtune()`	Intel VTune ITT	VTune on CPU
`create_tau(initialize = true)`	TAU via PerfStubs	TAU and other PerfStubs targets
`create_for_executor(exec)`	Picks the right one of the above for the executor’s backend (NVTX for CUDA, ROCTX for HIP, TAU otherwise)	General-purpose default
`create_custom(begin, end, …)`	User-supplied callbacks	Any profiler with a C entry point — including Caliper

auto profiler = gko::log::ProfilerHook::create_for_executor(exec);
solver->add_logger(profiler);

In the profiler UI you will see solver phases grouped by region name, which makes it straightforward to identify which part of a solve is expensive.

Caliper#

Ginkgo does not ship a dedicated Caliper factory; the supported integration path is create_custom. Pass two function objects that call Caliper’s C API directly (cali_begin_region / cali_end_region) as the begin/end callbacks, and Ginkgo will invoke them at every annotated region boundary.

A worked example of Caliper integration in an application is planned — see How-To: ProfilerHook with Caliper.

Performance logger#

Performance is more light-weight than ProfilerHook. It accumulates wall-clock durations per operation type and per executor, without emitting external profiling markers:

auto perf = std::make_shared<gko::log::Performance>();
exec->add_logger(perf);
// ... run workload ...
auto stats = perf->get_total_time(/* operation */);

Because it attaches to an executor rather than a solver, it captures all operations on that device, not just the ones inside a single apply call. This is useful for characterising the total GPU time a benchmark spends in Ginkgo.

Batched: BatchConvergence#

Batched solvers solve many independent systems simultaneously. Each batch item has its own iteration count and residual, so the scalar Convergence logger does not apply. Use BatchConvergence instead:

auto bcv = std::make_shared<gko::log::BatchConvergence<double>>();
batched_solver->add_logger(bcv);
batched_solver->apply(rhs, sol);

auto iters = bcv->get_num_iterations();   // array<int>: per-item iteration count
auto norms = bcv->get_residual_norm();    // array<double>: per-item final residual norm

Both returned arrays have one entry per batch item and live on the solver’s executor. Copy to host to inspect individual values.

Custom loggers#

If the built-in loggers do not fit your use case, subclass gko::log::Logger and override exactly the events you care about:

class MyLogger : public gko::log::Logger {
public:
    void on_iteration_complete(
        const gko::LinOp* solver,
        const gko::LinOp* b,
        const gko::LinOp* x,
        const gko::size_type& num_iterations,
        const gko::LinOp* residual,
        const gko::LinOp* residual_norm,
        const gko::LinOp* implicit_sq_residual_norm,
        const gko::array<gko::stopping_status>* status,
        bool all_stopped) const override
    {
        // capture what you need
    }
};

The key rule: do not block inside a logger. Loggers run in the hot path of apply — heavy I/O or synchronisation here will dominate your solve time.

Publications#

Towards Continuous Benchmarking: An Automated Performance Evaluation Framework for High Performance Software