Add a new matrix format#

A new sparse matrix format in Ginkgo is a LinOp subclass that owns its storage arrays, plumbs apply into a backend-specific SpMV kernel, and converts to/from Ginkgo’s host-side assembly type (matrix_data / device_matrix_data). This page walks through the skeleton, modelled on the existing matrix::Ell (include/ginkgo/core/matrix/ell.hpp, core/matrix/ell.cpp).

What you commit to implement#

Every matrix format provides at least:

Trait

Why

EnableLinOp<Self> base

Auto-generates the apply dispatcher; you only write apply_impl.

ReadableFromMatrixData<V, I>

The read overloads that ingest matrix_data from host code.

WritableToMatrixData<V, I>

write to round-trip back to host-side matrix_data.

ConvertibleTo<Dense<V>>, ConvertibleTo<Csr<V, I>>

At minimum a path to and from CSR; Dense is conventional.

ConvertibleTo<Self<next_precision<V>, I>>

Cross-precision conversion to the neighbouring float type.

One backend kernel set

spmv plus advanced_spmv (alpha/beta variant); plus a read / convert_to_csr kernel per backend.

Optional but conventional: DiagonalExtractable<V>, EnableAbsoluteComputation<...>, conversion to other formats (COO, ELL, …).

The class skeleton#

Public header (include/ginkgo/core/matrix/my_format.hpp):

template <typename ValueType = default_precision, typename IndexType = int32>
class MyFormat
    : public EnableLinOp<MyFormat<ValueType, IndexType>>,
      public ConvertibleTo<MyFormat<next_precision<ValueType>, IndexType>>,
      public ConvertibleTo<Dense<ValueType>>,
      public ConvertibleTo<Csr<ValueType, IndexType>>,
      public ReadableFromMatrixData<ValueType, IndexType>,
      public WritableToMatrixData<ValueType, IndexType>,
      public DiagonalExtractable<ValueType>,
      public EnableAbsoluteComputation<remove_complex<MyFormat<ValueType, IndexType>>> {
    friend class EnablePolymorphicObject<MyFormat, LinOp>;
    GKO_ASSERT_SUPPORTED_VALUE_AND_INDEX_TYPE;

public:
    using value_type = ValueType;
    using index_type = IndexType;
    using mat_data        = matrix_data<ValueType, IndexType>;
    using device_mat_data = device_matrix_data<ValueType, IndexType>;
    using absolute_type   = remove_complex<MyFormat>;

    // Conversions (one pair per ConvertibleTo above):
    void convert_to(Dense<ValueType>* result) const override;
    void move_to   (Dense<ValueType>* result)       override;
    void convert_to(Csr<ValueType, IndexType>* result) const override;
    void move_to   (Csr<ValueType, IndexType>* result)       override;
    void convert_to(MyFormat<next_precision<ValueType>, IndexType>* result) const override;
    void move_to   (MyFormat<next_precision<ValueType>, IndexType>* result)       override;

    // Assembly ingress / egress:
    void read(const mat_data& data) override;
    void read(const device_mat_data& data) override;
    void read(device_mat_data&& data) override;
    void write(mat_data& data) const override;

    // Auxiliary traits:
    std::unique_ptr<Diagonal<ValueType>> extract_diagonal() const override;
    std::unique_ptr<absolute_type>       compute_absolute() const override;
    void                                 compute_absolute_inplace() override;

    // Per-format storage accessors (data shape depends on your format):
    value_type*       get_values() noexcept { return values_.get_data(); }
    const value_type* get_const_values() const noexcept { return values_.get_const_data(); }
    // ... index arrays, dimensions, etc.

protected:
    // Constructors live in the protected section; users go through ::create.
    MyFormat(std::shared_ptr<const Executor> exec, const dim<2>& size = {});

    void apply_impl(const LinOp* b, LinOp* x) const override;
    void apply_impl(const LinOp* alpha, const LinOp* b,
                    const LinOp* beta, LinOp* x) const override;

private:
    array<value_type> values_;
    array<index_type> col_idxs_;
    // ... whatever else your storage scheme needs
};

A few notes on the base list:

  • EnableLinOp<MyFormat> injects the public apply overloads and the create(...) factory. You only write apply_impl.

  • The half/bfloat16 precision strands (next_precision<V, 2>, next_precision<V, 3>) live under #if GINKGO_ENABLE_HALF || ... guards in ell.hpp. Copy that pattern if you want low-precision variants.

  • GKO_ASSERT_SUPPORTED_VALUE_AND_INDEX_TYPE is a static-assert that catches accidentally-instantiated combinations outside the supported set (e.g. (int8, int64)).

  • The non-public constructor + base-class-generated create is deliberate. Users always go through MyFormat::create(exec, ...), which returns a std::unique_ptr<MyFormat>.

apply_impl#

The EnableLinOp base auto-generates the public apply overloads; you implement the two apply_impl overrides. Pick the dispatch helper by the SpMV kernel’s signature:

  • mixed_precision_dispatch_real_complex<ValueType>(fn, b, x) if the kernel is templated on three independent value types (InputValueType, MatrixValueType, OutputValueType) — CSR, ELL, SparsityCsr. Only the 3-arg form exists, so the advanced apply captures alpha/beta into the lambda.

  • precision_dispatch_real_complex<ValueType>(fn, ...) if the kernel is templated on a single ValueType — COO, SELL-P, FBCSR, Diagonal, Hybrid, Dense, ScaledPermutation. Has both 3-arg and 5-arg forms.

The example below shows the mixed-precision pattern; for a single-precision kernel, use the non-mixed helper and pass alpha, b, beta, x directly into the 5-arg form.

template <typename ValueType, typename IndexType>
void MyFormat<ValueType, IndexType>::apply_impl(const LinOp* b, LinOp* x) const
{
    mixed_precision_dispatch_real_complex<ValueType>(
        [this](auto dense_b, auto dense_x) {
            this->get_executor()->run(my_format::make_spmv(
                this->get_const_device_view(),
                dense_b->get_const_device_view(),
                dense_x->get_device_view()));
        },
        b, x);
}

template <typename ValueType, typename IndexType>
void MyFormat<ValueType, IndexType>::apply_impl(const LinOp* alpha,
                                                const LinOp* b,
                                                const LinOp* beta,
                                                LinOp* x) const
{
    mixed_precision_dispatch_real_complex<ValueType>(
        [this, alpha, beta](auto dense_b, auto dense_x) {
            auto dense_alpha = make_temporary_conversion<ValueType>(alpha);
            auto dense_beta  = make_temporary_conversion<
                typename std::decay_t<decltype(*dense_x)>::value_type>(beta);
            this->get_executor()->run(my_format::make_advanced_spmv(
                dense_alpha->get_const_device_view(),
                this->get_const_device_view(),
                dense_b->get_const_device_view(),
                dense_beta->get_const_device_view(),
                dense_x->get_device_view()));
        },
        b, x);
}

Notes on the snippet:

  • make_temporary_conversion<ValueType>(alpha) returns a temporary_conversion<Dense<ValueType>>: a unique-pointer-like wrapper that holds either the existing Dense (no copy) or a fresh conversion to ValueType. Access via -> or .get(). Destroyed with the lambda, so the conversion is bounded by the apply.

  • my_format::make_spmv / make_advanced_spmv are the operation builders produced by GKO_REGISTER_OPERATION (see Add a new kernel).

  • All kernel arguments use views (matrix::view::dense<...>, matrix::view::<your_format><...>) rather than raw Dense* / Format*. Views make const-ness and value-type explicit in the signature and are the required convention for new formats. Ell and Coo are the precedents to copy; Csr and the older formats pass raw const Format* for the matrix as legacy.

read / write — host-side assembly#

The mandatory read(matrix_data) ingests Ginkgo’s host-side incremental-assembly type. The conventional shape: count nonzeros to size your arrays, then hand the device-side fill-in to a kernel:

template <typename ValueType, typename IndexType>
void MyFormat<ValueType, IndexType>::read(const device_mat_data& data)
{
    auto size = data.get_size();
    // Compute layout (max nnz per row, padding, etc.) from `data` on its executor.
    // Resize storage:
    this->resize(size, /* format-specific shape */);
    // Hand off to backend kernel:
    this->get_executor()->run(my_format::make_fill_in_matrix_data(
        data, this));
}

template <typename ValueType, typename IndexType>
void MyFormat<ValueType, IndexType>::read(const mat_data& data)
{
    this->read(device_mat_data::create_from_host(this->get_executor(), data));
}

template <typename ValueType, typename IndexType>
void MyFormat<ValueType, IndexType>::read(device_mat_data&& data)
{
    this->read(data);   // or specialise: avoid the extra copy if possible
}

write is the reverse:

template <typename ValueType, typename IndexType>
void MyFormat<ValueType, IndexType>::write(mat_data& data) const
{
    // Walk the local storage on the host (use a temporary clone if your
    // matrix lives on the device) and append triplets to `data.nonzeros`.
}

Conversions#

ConvertibleTo<...> bases are not strictly required by anything in LinOp, but the conventions across shipped formats are clear:

  • Csr<V, I> — provide it. CSR is the canonical sparse interop target; every sparse format in the tree converts to/from CSR. Many algorithms and the benchmark suite rely on this path. Typically implemented as a kernel (my_format::make_convert_to_csr) because it’s a parallel storage-layout transformation.

  • Dense<V> — provide it unless your storage cannot be densified meaningfully (the only shipped exception is Diagonal, which omits Dense). The implementation can route through CSR or fill the dense buffer directly.

  • MyFormat<next_precision<V>, I> — needed for cross-precision interop. The body is a member-wise assignment plus set_size. Add the next_precision<V, 2> / next_precision<V, 3> strands under #if GINKGO_ENABLE_HALF || GINKGO_ENABLE_BFLOAT16 guards if you want half/bfloat16 support.

Kernel files#

Each operation apply_impl and convert_to calls needs the four files described in Add a new kernel. For a new format my_format, the layout is:

File

Purpose

core/matrix/my_format_kernels.hpp

Kernel macro declarations

reference/matrix/my_format_kernels.cpp

Reference single-thread impl

omp/matrix/my_format_kernels.cpp

OMP impl (if hand-tuned)

common/unified/matrix/my_format_kernels.cpp

Portable CUDA/HIP/SYCL/OMP impl

cuda/matrix/my_format_kernels.cu (optional)

Hand-tuned CUDA

hip/matrix/my_format_kernels.hip.cpp (optional)

Hand-tuned HIP

dpcpp/matrix/my_format_kernels.dp.cpp (optional)

Hand-tuned SYCL

CMake wiring#

A new format does require CMake edits, unlike a new kernel on an existing format. Add the file to each backend’s source list:

  • core/CMakeLists.txt — add matrix/my_format.cpp next to matrix/ell.cpp.

  • reference/CMakeLists.txt, omp/CMakeLists.txt, dpcpp/CMakeLists.txt — add matrix/my_format_kernels.cpp (.dp.cpp for DPC++) next to matrix/ell_kernels.cpp.

  • common/unified/CMakeLists.txt — add matrix/my_format_kernels.cpp if you have a portable kernel.

  • cuda/CMakeLists.txt, hip/CMakeLists.txt — add the backend-specific files only if you wrote hand-tuned variants.

The public header include/ginkgo/core/matrix/my_format.hpp becomes discoverable through the standard install layout once the file exists in the tree — no CMake list to update for the include.

You do not need to register matrix formats with the config layer#

  • The runtime JSON / gko::config registry is for factories (solvers, preconditioners, factorizations).

  • Matrix formats are constructed directly: MyFormat::create(exec, ...).

  • If you want the format to participate in the benchmark suite’s FORMATS=... selector, extend benchmark/utils/types.hpp (lists the recognised format names).

Reference-parity test#

The full test plan is covered in Write tests. For a new format you write two files:

  • reference/test/matrix/my_format_kernels.cpp — small, hand-checkable inputs run only on the Reference executor. Covers the SpMV kernels, the conversions, and the read/write round-trip. Register with ginkgo_create_test(my_format_kernels) in reference/test/matrix/CMakeLists.txt.

  • test/matrix/my_format_kernels.cpp — the cross-backend parity tests. Same source file compiled once per enabled backend (omp, cuda, hip, dpcpp) by ginkgo_create_common_test, using CommonTestFixture (from test/utils/common_fixture.hpp) which provides both ref (Reference) and exec (the target backend). Each test runs the same input on both executors and asserts agreement with GKO_ASSERT_MTX_NEAR(...). Register with ginkgo_create_common_test(my_format_kernels) in test/matrix/CMakeLists.txt. Use inputs at least the warp size (typically 100×100 or larger) so warp-boundary corner cases are exercised.

The legacy per-backend trees (cuda/test/matrix/, hip/test/matrix/, …) are largely empty now — only formats that still need backend- specific test paths live there. Put new tests in test/matrix/.

See also