Matrix formats — an overview#
Ginkgo supports several sparse-matrix storage formats, each tuned for a different memory
access pattern and workload. This page introduces the format zoo, explains the trade-off
axes that separate them, describes the conversion graph, and offers selection guidance.
Detailed coverage of individual formats lives in the formats/ sub-pages.
The format zoo#
Format |
Class |
Storage idea |
Best for |
|---|---|---|---|
CSR |
|
Row-pointers + col-indices + values |
General-purpose default, fast SpMV |
COO |
|
Triples (row, col, value) |
Assembly, file IO |
ELL |
|
Padded fixed-width-per-row |
Matrices with near-uniform row length |
Sellp |
|
Slab-padded ELL (slice-ELL) |
GPU-friendly for moderately-irregular rows |
Hybrid |
|
ELL part + COO overflow |
Mixed-row-length matrices |
Fbcsr |
|
Block-CSR (small dense blocks) |
Blocked structures (e.g., FE matrices) |
Dense |
|
Row-major dense |
Multi-vectors, small dense matrices |
All seven implement the LinOp interface: every format supports apply(b, x) for SpMV,
and most support copy_from conversions to and from the other formats.
In addition to these seven, Ginkgo also provides gko::matrix::SparsityCsr — a
pattern-only representation that stores row pointers and column indices but no
per-entry values (a single shared scalar, default 1.0, is applied to every
non-zero). It is not a value-bearing format and so does not appear in the table
above; its role is to support sparsity-pattern analysis, symbolic factorisation
preparation, and graph-style algorithms where the values are either trivial or
irrelevant. SparsityCsr converts directly to and from CSR and Dense and reads from
matrix_data like the other formats. See the dedicated
SparsityCsr page for the full treatment.
Trade-off axes#
The table above reduces each format to one line. Five axes give a fuller picture:
Assembly cost. How cheaply can you build the matrix from random (row, col, value)
triples? COO wins — it stores triples directly. CSR requires a prefix-sum over row
counts; ELL and Sellp require knowing the longest row up front. If your application
assembles incrementally (adding contributions from element matrices), use COO or
matrix_data first, then convert.
SpMV throughput. SpMV is memory-bound for typical sparse matrices. CSR is the workhorse on CPU. ELL and Sellp can outperform CSR on GPU for matrices with near-uniform row lengths because they eliminate the indirect row-pointer read. Hybrid covers the case where some rows are long outliers.
Memory footprint. ELL pads every row to the same length as the longest row — catastrophic for highly skewed row-length distributions. Sellp mitigates by grouping rows into slabs and padding only within a slab. Hybrid offloads overflow rows to COO.
Conversion cost. Converting between formats is a device kernel — fast, but not free. CSR and Dense are the two “hubs” with direct converters to every other format; the remaining sparse formats (COO, ELL, Sellp, Hybrid) reach each other indirectly via CSR. If you convert repeatedly inside a tight loop, profile whether the conversion dominates.
Operation coverage. CSR has the broadest kernel coverage in Ginkgo: SpMV, transpose, scaling, factorisation. Other formats cover SpMV and a subset of the rest. Preconditioners and factorisation algorithms typically require CSR input and will convert silently if given another format.
The conversion graph#
Two formats act as conversion hubs: CSR is the sparse hub (direct converters to and
from every other sparse format) and Dense is the dense hub (direct converters to and
from every sparse format). Every sparse format also accepts a host-side matrix_data
through its read() method, so assembly never has to bounce through an intermediate
format.
flowchart LR
MD[matrix_data]
MD -.read.-> CSR
MD -.read.-> COO
MD -.read.-> ELL
MD -.read.-> Sellp
MD -.read.-> Hybrid
MD -.read.-> Fbcsr
MD -.read.-> Dense
MD -.read.-> SparsityCsr
CSR <--> COO
CSR <--> ELL
CSR <--> Sellp
CSR <--> Hybrid
CSR <--> Fbcsr
CSR <--> SparsityCsr
CSR <--> Dense
Dense <--> COO
Dense <--> ELL
Dense <--> Sellp
Dense <--> Hybrid
Dense <--> Fbcsr
Dense <--> SparsityCsr
Fbcsr <--> SparsityCsr
classDef hub fill:#3a6b4a,stroke:#3a6b4a,color:#fbfaf6
classDef leaf fill:#fbfaf6,stroke:#c9a14a,color:#1f2a24
classDef src fill:#fbfaf6,stroke:#3a6b4a,color:#1f2a24,stroke-dasharray: 4 2
class CSR,Dense hub
class COO,ELL,Sellp,Hybrid,Fbcsr,SparsityCsr leaf
class MD src
CSR ↔ COO is particularly cheap (both store one index per non-zero; the difference is compressed vs explicit row indices). Dense ↔ sparse is only sensible for matrices small enough to fit in dense memory.
Attention
Conversions that do not appear as a direct edge in the graph above are not routed
automatically. copy_from performs a dynamic_cast to the destination’s
ConvertibleTo interface and throws gko::NotSupported if the source does not
implement it. For example, COO → ELL is not a direct conversion: you must hop
explicitly through CSR (csr->copy_from(coo); ell->copy_from(csr);) or re-read
from matrix_data.
The conversion API is uniform across all formats:
auto coo = gko::matrix::Coo<double, int>::create(exec);
coo->read(data);
// COO → CSR
auto csr = gko::matrix::Csr<double, int>::create(exec);
csr->copy_from(coo);
// CSR → ELL
auto ell = gko::matrix::Ell<double, int>::create(exec);
ell->copy_from(csr);
// CSR → Sellp
auto sellp = gko::matrix::Sellp<double, int>::create(exec);
sellp->copy_from(csr);
copy_from runs on the matrix’s executor and returns the matrix in the new format.
The source matrix is unchanged.
Note
Some conversions allocate temporary intermediate storage. For very large matrices on a memory-constrained GPU, prefer a direct path (e.g., COO → CSR) over a multi-hop path.
Selection guidance#
When choosing a format, work through these questions in order:
Default to CSR. It performs well on both CPU and GPU, has the most kernel coverage in Ginkgo, and is what preconditioners and factorisation algorithms expect.
Use COO for assembly, then convert. Building a matrix incrementally from element contributions is easiest in COO (or
matrix_data). Construct COO, callcsr->copy_from(coo), and release the COO when you no longer need it.Consider ELL or Sellp for GPU with regular structure. If your matrix comes from a structured mesh or a banded problem where every row has roughly the same number of non-zeros, ELL or Sellp can outperform CSR in repeated SpMV on GPU. Sellp is more flexible (it handles moderate variation with less padding waste than ELL).
Use Hybrid for mixed-row-length matrices. Typical example: coarse-grid levels in algebraic multigrid, where most rows are short but a few are long. Hybrid stores the short rows in ELL and overflows long rows into COO.
Use Fbcsr for block-structured matrices. If your matrix has a natural block structure — for example,
kdegrees of freedom per finite-element node, producingk×kdense blocks — Fbcsr stores those blocks explicitly and exploits their dense structure in SpMV.Dense is not a sparse format. Use
gko::matrix::Densefor multi-vectors (the right-hand sideband solutionxin SpMV) and for small dense matrices. Do not use it to store a large sparse matrix.
Per-format pages#
Publications#
See also
Matrix data and assembly — getting data into a format.
API reference:
gko::matrix