cuDSS direct solver#

gko::ext::cuda::solver::Cudss is a wrapper around NVIDIA’s cuDSS sparse direct solver. It is only available on the CudaExecutor and only when CMake found a cuDSS installation while configuring Ginkgo (see Build-time enablement below). It is the recommended path when you need a fast sparse direct solver on NVIDIA GPUs and Ginkgo’s own LU/Cholesky factorization does not perform well for your matrix.

When to use it#

cuDSS is appropriate when:

  • You are running on NVIDIA hardware and want vendor-tuned sparse factorization / triangular solve performance.

  • Your matrix has a structure that benefits from cuDSS’s reordering and pivoting heuristics.

  • You need to refactorize repeatedly with the same sparsity pattern but different numerical values (e.g., a Newton iteration that updates Jacobian values without reshaping the system) — cuDSS reuses both the symbolic analysis and the factorization data structures across refactorize calls, updating only the numerical values in place. Ginkgo’s own experimental::solver::Direct can skip the symbolic step (by passing a previously computed symbolic_factorization to the Lu / Cholesky factory it wraps), but the numeric factorization and the solver itself are rebuilt each time, so the saving is smaller than cuDSS’s in-place refactorization.

For non-CUDA backends, fall back to gko::experimental::solver::Direct.

Construction#

#include <ginkgo/extensions/cuda/solver/cudss.hpp>

auto exec = gko::CudaExecutor::create(0, gko::ReferenceExecutor::create());

auto solver = gko::ext::cuda::solver::Cudss<double, gko::int32>::build()
                  .with_matrix_type(0)   // 0 = GENERAL (unsymmetric)
                  .with_matrix_view(0)   // 0 = FULL (entire matrix stored)
                  .on(exec)
                  ->generate(system_matrix);

solver->apply(b, x);

The factorization is computed during generate() and reused across apply calls.

Matrix type and view#

cuDSS exposes two parameters that control how it interprets the supplied CSR matrix:

Parameter

Values

Meaning

matrix_type

0 GENERAL, 1 SYMMETRIC, 2 HERMITIAN, 3 SPD, 4 HPD

The mathematical structure of the matrix.

matrix_view

0 FULL, 1 LOWER, 2 UPPER

What the CSR storage actually contains: full matrix, only lower triangle + diagonal, or only upper triangle + diagonal.

Attention

Storage-format mismatch between Ginkgo and cuDSS:

  • gko::matrix::Csr stores the full matrix.

  • cuDSS’s symmetric / Hermitian / SPD / HPD modes assume only one triangle is stored.

  • Passing a fully-stored symmetric matrix with matrix_type set to a symmetric mode violates cuDSS’s input contract and can produce wrong results.

Two correct paths:

  • Fully-stored matrixmatrix_type = 0 (GENERAL), matrix_view = 0 (FULL).

  • Symmetric factorisation — extract one triangle into a new CSR first, then construct the solver with the symmetric mode.

Refactorization with the same sparsity pattern#

When the matrix’s sparsity pattern is fixed but its numerical values change between solves, calling refactorize(new_matrix) updates the numeric factorization without re-running the symbolic analysis:

solver->refactorize(updated_matrix);   // same sparsity, new values
solver->apply(b, x);

The new matrix must have the same dimensions and number of non-zeros as the matrix used in generate(). This is much cheaper than rebuilding the solver from scratch.

Supported value and index types#

Trait

Supported types

ValueType

float, double, std::complex<float>, std::complex<double>

IndexType

gko::int32 (default), gko::int64 (subject to cuDSS support)

Build-time enablement#

The cuDSS extension does not have a dedicated CMake flag. It is built automatically when both of the following hold at configure time:

  1. Ginkgo is built with CUDA support (-DGINKGO_BUILD_CUDA=ON).

  2. CMake’s find_package(cudss 0.7.1 CONFIG) discovers a cuDSS installation.

If cuDSS is not in a default search location, point CMake at it explicitly:

cmake -DGINKGO_BUILD_CUDA=ON -Dcudss_DIR=/path/to/cudss/lib/cmake/cudss ...

If cuDSS is not found, the build emits a STATUS message and silently skips the extension — gko::ext::cuda::solver::Cudss will not be available, but the rest of Ginkgo continues to build normally.

Limitations#

  • CUDA only — there is no AMD or Intel equivalent in the extensions tree.

  • Opaque factorization — the L and U factors are stored in cuDSS-native format and cannot be extracted as Ginkgo LinOps.

  • Build-time discovery — requires CUDA support plus a CMake-discoverable cuDSS installation; no GINKGO_BUILD_EXT_CUDSS flag exists.

See also