Add a new matrix format#
A new sparse matrix format in Ginkgo is a LinOp subclass that owns its
storage arrays, plumbs apply into a backend-specific SpMV kernel, and
converts to/from Ginkgo’s host-side assembly type (matrix_data /
device_matrix_data). This page walks through the skeleton, modelled on
the existing matrix::Ell (include/ginkgo/core/matrix/ell.hpp,
core/matrix/ell.cpp).
What you commit to implement#
Every matrix format provides at least:
Trait |
Why |
|---|---|
|
Auto-generates the |
|
The |
|
|
|
At minimum a path to and from CSR; Dense is conventional. |
|
Cross-precision conversion to the neighbouring float type. |
One backend kernel set |
|
Optional but conventional: DiagonalExtractable<V>,
EnableAbsoluteComputation<...>, conversion to other formats (COO, ELL, …).
The class skeleton#
Public header (include/ginkgo/core/matrix/my_format.hpp):
template <typename ValueType = default_precision, typename IndexType = int32>
class MyFormat
: public EnableLinOp<MyFormat<ValueType, IndexType>>,
public ConvertibleTo<MyFormat<next_precision<ValueType>, IndexType>>,
public ConvertibleTo<Dense<ValueType>>,
public ConvertibleTo<Csr<ValueType, IndexType>>,
public ReadableFromMatrixData<ValueType, IndexType>,
public WritableToMatrixData<ValueType, IndexType>,
public DiagonalExtractable<ValueType>,
public EnableAbsoluteComputation<remove_complex<MyFormat<ValueType, IndexType>>> {
friend class EnablePolymorphicObject<MyFormat, LinOp>;
GKO_ASSERT_SUPPORTED_VALUE_AND_INDEX_TYPE;
public:
using value_type = ValueType;
using index_type = IndexType;
using mat_data = matrix_data<ValueType, IndexType>;
using device_mat_data = device_matrix_data<ValueType, IndexType>;
using absolute_type = remove_complex<MyFormat>;
// Conversions (one pair per ConvertibleTo above):
void convert_to(Dense<ValueType>* result) const override;
void move_to (Dense<ValueType>* result) override;
void convert_to(Csr<ValueType, IndexType>* result) const override;
void move_to (Csr<ValueType, IndexType>* result) override;
void convert_to(MyFormat<next_precision<ValueType>, IndexType>* result) const override;
void move_to (MyFormat<next_precision<ValueType>, IndexType>* result) override;
// Assembly ingress / egress:
void read(const mat_data& data) override;
void read(const device_mat_data& data) override;
void read(device_mat_data&& data) override;
void write(mat_data& data) const override;
// Auxiliary traits:
std::unique_ptr<Diagonal<ValueType>> extract_diagonal() const override;
std::unique_ptr<absolute_type> compute_absolute() const override;
void compute_absolute_inplace() override;
// Per-format storage accessors (data shape depends on your format):
value_type* get_values() noexcept { return values_.get_data(); }
const value_type* get_const_values() const noexcept { return values_.get_const_data(); }
// ... index arrays, dimensions, etc.
protected:
// Constructors live in the protected section; users go through ::create.
MyFormat(std::shared_ptr<const Executor> exec, const dim<2>& size = {});
void apply_impl(const LinOp* b, LinOp* x) const override;
void apply_impl(const LinOp* alpha, const LinOp* b,
const LinOp* beta, LinOp* x) const override;
private:
array<value_type> values_;
array<index_type> col_idxs_;
// ... whatever else your storage scheme needs
};
A few notes on the base list:
EnableLinOp<MyFormat>injects the publicapplyoverloads and thecreate(...)factory. You only writeapply_impl.The half/bfloat16 precision strands (
next_precision<V, 2>,next_precision<V, 3>) live under#if GINKGO_ENABLE_HALF || ...guards inell.hpp. Copy that pattern if you want low-precision variants.GKO_ASSERT_SUPPORTED_VALUE_AND_INDEX_TYPEis a static-assert that catches accidentally-instantiated combinations outside the supported set (e.g.(int8, int64)).The non-public constructor + base-class-generated
createis deliberate. Users always go throughMyFormat::create(exec, ...), which returns astd::unique_ptr<MyFormat>.
apply_impl#
The EnableLinOp base auto-generates the public apply overloads; you
implement the two apply_impl overrides. Pick the dispatch helper by
the SpMV kernel’s signature:
mixed_precision_dispatch_real_complex<ValueType>(fn, b, x)if the kernel is templated on three independent value types (InputValueType, MatrixValueType, OutputValueType) — CSR, ELL, SparsityCsr. Only the 3-arg form exists, so the advanced apply capturesalpha/betainto the lambda.precision_dispatch_real_complex<ValueType>(fn, ...)if the kernel is templated on a singleValueType— COO, SELL-P, FBCSR, Diagonal, Hybrid, Dense, ScaledPermutation. Has both 3-arg and 5-arg forms.
The example below shows the mixed-precision pattern; for a
single-precision kernel, use the non-mixed helper and pass alpha, b, beta, x directly into the 5-arg form.
template <typename ValueType, typename IndexType>
void MyFormat<ValueType, IndexType>::apply_impl(const LinOp* b, LinOp* x) const
{
mixed_precision_dispatch_real_complex<ValueType>(
[this](auto dense_b, auto dense_x) {
this->get_executor()->run(my_format::make_spmv(
this->get_const_device_view(),
dense_b->get_const_device_view(),
dense_x->get_device_view()));
},
b, x);
}
template <typename ValueType, typename IndexType>
void MyFormat<ValueType, IndexType>::apply_impl(const LinOp* alpha,
const LinOp* b,
const LinOp* beta,
LinOp* x) const
{
mixed_precision_dispatch_real_complex<ValueType>(
[this, alpha, beta](auto dense_b, auto dense_x) {
auto dense_alpha = make_temporary_conversion<ValueType>(alpha);
auto dense_beta = make_temporary_conversion<
typename std::decay_t<decltype(*dense_x)>::value_type>(beta);
this->get_executor()->run(my_format::make_advanced_spmv(
dense_alpha->get_const_device_view(),
this->get_const_device_view(),
dense_b->get_const_device_view(),
dense_beta->get_const_device_view(),
dense_x->get_device_view()));
},
b, x);
}
Notes on the snippet:
make_temporary_conversion<ValueType>(alpha)returns atemporary_conversion<Dense<ValueType>>: a unique-pointer-like wrapper that holds either the existingDense(no copy) or a fresh conversion toValueType. Access via->or.get(). Destroyed with the lambda, so the conversion is bounded by the apply.my_format::make_spmv/make_advanced_spmvare the operation builders produced byGKO_REGISTER_OPERATION(see Add a new kernel).All kernel arguments use views (
matrix::view::dense<...>,matrix::view::<your_format><...>) rather than rawDense*/Format*. Views make const-ness and value-type explicit in the signature and are the required convention for new formats.EllandCooare the precedents to copy;Csrand the older formats pass rawconst Format*for the matrix as legacy.
read / write — host-side assembly#
The mandatory read(matrix_data) ingests Ginkgo’s host-side
incremental-assembly type. The conventional shape: count nonzeros to size
your arrays, then hand the device-side fill-in to a kernel:
template <typename ValueType, typename IndexType>
void MyFormat<ValueType, IndexType>::read(const device_mat_data& data)
{
auto size = data.get_size();
// Compute layout (max nnz per row, padding, etc.) from `data` on its executor.
// Resize storage:
this->resize(size, /* format-specific shape */);
// Hand off to backend kernel:
this->get_executor()->run(my_format::make_fill_in_matrix_data(
data, this));
}
template <typename ValueType, typename IndexType>
void MyFormat<ValueType, IndexType>::read(const mat_data& data)
{
this->read(device_mat_data::create_from_host(this->get_executor(), data));
}
template <typename ValueType, typename IndexType>
void MyFormat<ValueType, IndexType>::read(device_mat_data&& data)
{
this->read(data); // or specialise: avoid the extra copy if possible
}
write is the reverse:
template <typename ValueType, typename IndexType>
void MyFormat<ValueType, IndexType>::write(mat_data& data) const
{
// Walk the local storage on the host (use a temporary clone if your
// matrix lives on the device) and append triplets to `data.nonzeros`.
}
Conversions#
ConvertibleTo<...> bases are not strictly required by anything in
LinOp, but the conventions across shipped formats are clear:
Csr<V, I>— provide it. CSR is the canonical sparse interop target; every sparse format in the tree converts to/from CSR. Many algorithms and the benchmark suite rely on this path. Typically implemented as a kernel (my_format::make_convert_to_csr) because it’s a parallel storage-layout transformation.Dense<V>— provide it unless your storage cannot be densified meaningfully (the only shipped exception isDiagonal, which omitsDense). The implementation can route through CSR or fill the dense buffer directly.MyFormat<next_precision<V>, I>— needed for cross-precision interop. The body is a member-wise assignment plusset_size. Add thenext_precision<V, 2>/next_precision<V, 3>strands under#if GINKGO_ENABLE_HALF || GINKGO_ENABLE_BFLOAT16guards if you want half/bfloat16 support.
Kernel files#
Each operation apply_impl and convert_to calls needs the four files
described in Add a new kernel. For a new format
my_format, the layout is:
File |
Purpose |
|---|---|
|
Kernel macro declarations |
|
Reference single-thread impl |
|
OMP impl (if hand-tuned) |
|
Portable CUDA/HIP/SYCL/OMP impl |
|
Hand-tuned CUDA |
|
Hand-tuned HIP |
|
Hand-tuned SYCL |
CMake wiring#
A new format does require CMake edits, unlike a new kernel on an existing format. Add the file to each backend’s source list:
core/CMakeLists.txt— addmatrix/my_format.cppnext tomatrix/ell.cpp.reference/CMakeLists.txt,omp/CMakeLists.txt,dpcpp/CMakeLists.txt— addmatrix/my_format_kernels.cpp(.dp.cppfor DPC++) next tomatrix/ell_kernels.cpp.common/unified/CMakeLists.txt— addmatrix/my_format_kernels.cppif you have a portable kernel.cuda/CMakeLists.txt,hip/CMakeLists.txt— add the backend-specific files only if you wrote hand-tuned variants.
The public header include/ginkgo/core/matrix/my_format.hpp becomes
discoverable through the standard install layout once the file exists in
the tree — no CMake list to update for the include.
You do not need to register matrix formats with the config layer#
The runtime JSON /
gko::configregistry is for factories (solvers, preconditioners, factorizations).Matrix formats are constructed directly:
MyFormat::create(exec, ...).If you want the format to participate in the benchmark suite’s
FORMATS=...selector, extendbenchmark/utils/types.hpp(lists the recognised format names).
Reference-parity test#
The full test plan is covered in Write tests. For a new format you write two files:
reference/test/matrix/my_format_kernels.cpp— small, hand-checkable inputs run only on the Reference executor. Covers the SpMV kernels, the conversions, and theread/writeround-trip. Register withginkgo_create_test(my_format_kernels)inreference/test/matrix/CMakeLists.txt.test/matrix/my_format_kernels.cpp— the cross-backend parity tests. Same source file compiled once per enabled backend (omp,cuda,hip,dpcpp) byginkgo_create_common_test, usingCommonTestFixture(fromtest/utils/common_fixture.hpp) which provides bothref(Reference) andexec(the target backend). Each test runs the same input on both executors and asserts agreement withGKO_ASSERT_MTX_NEAR(...). Register withginkgo_create_common_test(my_format_kernels)intest/matrix/CMakeLists.txt. Use inputs at least the warp size (typically 100×100 or larger) so warp-boundary corner cases are exercised.
The legacy per-backend trees (cuda/test/matrix/, hip/test/matrix/,
…) are largely empty now — only formats that still need backend-
specific test paths live there. Put new tests in test/matrix/.
See also
Add a new kernel — the kernel files that back
apply_impl.Write tests — the reference-parity pattern.
Run the benchmark suite — extend the SpMV benchmark to include your new format.