Hybrid (ELL + COO overflow)#

Hybrid splits each row’s non-zeros between a fixed-width ELL section and an overflow COO tail. The ELL part keeps the regular, coalesced-friendly layout for the common portion of every row, while the COO part absorbs any non-zeros that exceed the ELL width — typically a few long rows in an otherwise short-row matrix. SpMV runs as the sum of an ELL SpMV and a COO SpMV; both kernels still touch the same input and output vectors, but each is well-tuned for the layout it operates on.

Storage layout#

A Hybrid matrix owns two child matrices:

An internal gko::matrix::Ell of width \(k_\text{ell}\) holding the first \(k_\text{ell}\) non-zeros of every row (padded as usual when a row has fewer).
An internal gko::matrix::Coo holding the remaining non-zeros — only populated for rows whose original non-zero count exceeded \(k_\text{ell}\).

Matrix (4 × 4, one heavy row at the bottom):
[ 1  0  2  0 ]             ELL part (k_ell = 2, column-major):
[ 0  3  0  0 ]               values   = [1, 3, 4, 6,  2, 0, 5, 7]
[ 4  0  5  0 ]               col_idxs = [0, 1, 0, 0,  2, *, 2, 1]
[ 6  7  8  9 ]
                           COO part (overflow from row 3):
                             row_idxs = [3, 3]
                             col_idxs = [2, 3]
                             values   = [8, 9]

The ELL part dominates storage on a typical matrix: the COO overflow holds only the entries the ELL split could not accommodate.

Choosing the split: strategies#

The width \(k_\text{ell}\) is the central knob. Pick it too small and the COO tail grows; pick it too large and the ELL padding wastes memory. Hybrid ships five strategies that pick \(k_\text{ell}\) for you at construction time:

Strategy	What it picks
`automatic` (default)	Heuristic based on the matrix’s row-length distribution.
`column_limit`	A fixed user-supplied column count.
`imbalance_limit`	The smallest \(k\) such that no more than a given fraction of rows overflow.
`imbalance_bounded_limit`	Like `imbalance_limit` but with an absolute cap on \(k\).
`minimal_storage_limit`	The \(k\) that minimises total bytes (ELL + COO).

The chosen strategy is set on the Hybrid object before reading the data:

auto hybrid = gko::matrix::Hybrid<double, int>::create(
    exec,
    std::make_shared<
        gko::matrix::Hybrid<double, int>::imbalance_limit>(0.2));
hybrid->read(matrix_data);

When to use Hybrid#

Heavy-tailed row-length distributions. A handful of long rows would inflate ELL’s \(k_\text{max}\) by orders of magnitude; the COO overflow absorbs them.
GPU SpMV on irregular matrices where neither plain ELL nor Sellp recovers enough of ELL’s coalesced access.
Sparse matrices arising from agglomerated meshes with a few high-degree super-nodes alongside many low-degree regular nodes.

For matrices with uniform row lengths, plain ELL or CSR is simpler and faster — Hybrid’s two-kernel SpMV adds overhead that only pays off when the ELL alone would waste significant memory.

Construction#

// From matrix_data with the default automatic strategy
auto hybrid = gko::matrix::Hybrid<double, int>::create(exec);
hybrid->read(matrix_data);

// With an explicit strategy
auto hybrid = gko::matrix::Hybrid<double, int>::create(
    exec,
    std::make_shared<
        gko::matrix::Hybrid<double, int>::column_limit>(/*ell_width=*/ 8));
hybrid->read(matrix_data);

// Converted from CSR
auto hybrid = gko::matrix::Hybrid<double, int>::create(exec);
hybrid->copy_from(csr);

The chosen strategy is sticky on the Hybrid instance: any copy_from or read after construction reuses the same split policy.

Performance characteristics#

The total SpMV cost is the sum of:

An ELL SpMV over a fixed-width \(k_\text{ell}\) block — typically the bulk of the work, executing with full coalescing.
A COO SpMV over the overflow entries — atomic-add to the output vector, one thread per non-zero on the GPU.

For matrices well-suited to Hybrid the COO part is a small fraction of the non-zero count (often < 10 %), so its cost is dominated by the ELL kernel. For matrices where the COO overflow approaches the ELL workload, Hybrid loses its advantage; check the split with get_ell_num_stored_elements() and get_coo_num_stored_elements() after construction.

Conversion behaviour#

Hybrid converts to and from CSR, COO, and ELL:

// Hybrid → CSR
auto csr = gko::matrix::Csr<double, int>::create(exec);
csr->copy_from(hybrid);

// CSR → Hybrid
auto hybrid = gko::matrix::Hybrid<double, int>::create(exec);
hybrid->copy_from(csr);   // uses the default automatic strategy

Conversion preserves all numerical entries: the COO overflow re-merges with the ELL part to produce a flat CSR / COO view.