Run the benchmark suite#
Ginkgo ships a set of benchmark drivers under benchmark/:
Coverage: SpMV, BLAS, conversions, solvers, preconditioners, sparse BLAS, matrix statistics — single-device and distributed.
I/O: each driver reads a JSON case list on stdin and emits a JSON result list on stdout.
Composable: SpMV’s “fastest format per matrix” output feeds straight into the solver benchmark as its input.
Build the suite#
cmake .. \
-DGINKGO_BUILD_BENCHMARKS=ON \
-DCMAKE_BUILD_TYPE=Release # always benchmark in Release
cmake --build . -j
Release matters — performance numbers from a RelWithDebInfo build
underreport throughput by 10–30 % depending on backend. Distributed
benchmarks additionally need -DGINKGO_BUILD_MPI=ON.
Two optional helpers are worth installing alongside:
ssget— fetches matrices from the SuiteSparse collection by ID / name. Required byrun_all_benchmarks.sh. Either install it to a directory onPATHor invoke it inline with-a <archive-dir>.gflags— the benchmark drivers use it for command-line parsing; if the system version is too old, CMake fetches it for you.
Drivers#
After build, each benchmark area produces an executable in the build
tree. Always use --help for the authoritative option list — it
documents the expected JSON shape in addition to the flags:
Build path |
What it benchmarks |
|---|---|
|
SpMV across every requested matrix format |
|
Krylov + IR solvers (non-distributed) |
|
Preconditioner generate + apply |
|
Dense BLAS (axpy, dot, copy, …) |
|
SpGEMM, SpGEAM, transpose |
|
Matrix format conversion |
|
Size / load-imbalance / variance |
|
Synthesise block-diagonal matrices |
|
Distributed SpMV (needs MPI build) |
|
Distributed solvers (needs MPI build) |
|
Distributed BLAS on multi-vectors |
Each driver accepts at least one of three value-type variants:
--double (the default), --single, and --complex (with dcomplex /
scomplex for complex variants).
Input JSON#
All drivers read a single JSON array from stdin. The minimum shape for SpMV is:
[
{ "filename": "path/to/matrix.mtx", "rhs": "path/to/rhs.mtx" },
{ "filename": "path/to/another.mtx" }
]
The matrices and right-hand sides are in Matrix Market format. For the
solver benchmark, the cases also need an "optimal" field naming the
matrix format to use:
[
{
"filename": "Matrix.mtx",
"optimal": { "spmv": "csr" }
}
]
When you chain the benchmarks, you don’t author this field yourself —
the SpMV benchmark finds the fastest format and writes
"optimal.spmv" into its output, so:
./benchmark/spmv/spmv < cases.json > spmv_results.json
./benchmark/solver/solver < spmv_results.json > solver_results.json
./benchmark/preconditioner/preconditioner < solver_results.json > pre_results.json
Status messages go to stderr, results to stdout, so redirection works cleanly.
The convenience script#
benchmark/run_all_benchmarks.sh (also exposed as make benchmark
when you’re in the build directory) runs the SpMV → solver →
preconditioner pipeline on the SuiteSparse collection using
environment variables for configuration:
make benchmark \
BENCHMARK=solver \
EXECUTOR=cuda \
SYSTEM_NAME=A100 \
PRECONDS=jacobi,ilu \
SOLVERS=cg,gmres
The shell script downloads matrices via ssget, then walks the
collection and produces JSON files under
<build>/benchmark/results/<SYSTEM_NAME>/.... The most useful
variables:
Variable |
Effect |
|---|---|
|
Which pipeline to run. Default: |
|
Backend to benchmark on. Default: |
|
Tag the results — used in the output directory layout. |
|
Run only the |
|
Restrict to a hand-picked subset; lines are |
|
Value type. Default: |
|
Solvers to include in the solver benchmark. Default: |
|
Preconditioners to use. Default: |
|
Matrix formats to compare for SpMV. Default: |
|
Target residual reduction. Default: |
|
Iteration cap. Default: |
|
Emit per-iteration residuals and per-operation timing. Default: |
|
Use the device timer rather than the wall clock. Default: |
Variables can be export-ed once and reused across runs, or set
inline:
VARIABLE=value make benchmark
The full option list is in BENCHMARKING.md in the source tree and
in each driver’s --help output.
Best practice for representative numbers#
The BENCHMARKING.md guide spells these out — they are not
optional if you intend to publish the numbers:
Compile in
Releasemode.Run on an idle machine.
last,htop,nvidia-smi,rocm-smishow competing load.Each benchmark does one warm-up run and then averages 10 timed runs (fewer for the longer solver benchmarks). Override with the driver’s
--repetitionsflag if you need different counts.For the adaptive block Jacobi preconditioner specifically, enable
-DGINKGO_JACOBI_FULL_OPTIMIZATIONS=ON— the gain is large, but the build time also goes up materially (see Speed up rebuilds).The
overheadLinOp in--preconditioner overhead,--spmv overhead,--solver overheadmeasures Ginkgo’s framework overhead without doing any real work — useful to characterise the library’s own cost relative to the kernels.
Adding a benchmark for a new operator#
When you add a new solver, preconditioner, or matrix format, extend the
relevant driver so the operator participates in the comparison matrix.
The driver picks operators from a string list keyed by name in
benchmark/utils/:
File |
Holds |
|---|---|
|
The recognised matrix-format names (CSR, COO, ELL, …) |
|
Solver / preconditioner factory string maps |
|
The detailed-mode loggers (per-iteration residual, per-op timing) |
Adding a new entry usually means one line each in the relevant string map plus a small factory builder. Mirror what the existing entries do for an analogous operator. Then add a case to the suite’s CI matrix (see Submit a pull request) so the new operator is covered going forward.
See also
BENCHMARKING.mdin the Ginkgo source tree — the authoritative reference, including the full SuiteSparse setup loop.Speed up rebuilds — for getting the suite built quickly.
Submit a pull request — CI runs a subset of the benchmark suite on every merge, so this is what changes are validated against.