MC64 (matching and scaling)#

MC64 is a matching-and-scaling preprocessing step for sparse direct factorisation. Unlike AMD or nested dissection — which permute rows and columns to reduce fill-in but leave numerical values untouched — MC64 picks a row permutation that maximises a metric over the resulting diagonal absolute values and simultaneously emits a pair of diagonal scaling matrices \(D_r, D_c\) that bring every diagonal entry of the transformed matrix to unit magnitude.

Phrased as an optimisation, MC64 solves a bipartite-matching problem on the matrix graph. The two scoring strategies Ginkgo supports correspond to different per-edge weight functions \(w_{ij}\), and in both cases the matching minimises the reduced cost \(m_i - w_{i,\sigma(i)}\), where \(m_i = \max_j w_{ij}\) is the row-maximum:

Strategy

Per-edge weight \(w_{ij}\)

Matching objective

Equivalent diagonal goal

max_diagonal_product (default)

\(w_{ij} = \log_2 \lvert a_{ij} \rvert\)

\(\min_{\sigma} \sum_{i} \bigl( m_i - w_{i,\sigma(i)} \bigr)\)

maximise \(\prod_{i} \lvert a_{i,\sigma(i)} \rvert\)

max_diagonal_sum

\(w_{ij} = \lvert a_{ij} \rvert\)

\(\min_{\sigma} \sum_{i} \bigl( m_i - w_{i,\sigma(i)} \bigr)\)

maximise \(\sum_{i} \lvert a_{i,\sigma(i)} \rvert\)

Subtracting the row maximum before matching keeps every cost non-negative — required by the shortest-augmenting-path matching the implementation uses. The log-space transform on the product strategy also avoids the underflow that would otherwise plague a literal \(\prod \lvert a_{ij} \rvert\) over thousands of subunit values.

After matching, MC64 chooses \(D_r\) and \(D_c\) so the permuted-and-scaled matrix \(\hat A = D_r\, P A\, D_c\) has \(\lvert \hat a_{ii} \rvert = 1\) and \(\lvert \hat a_{ij} \rvert \le 1\).

When to use MC64#

  • Indefinite or strongly non-symmetric matrices headed for LU factorisation. Pushing large entries onto the diagonal reduces pivoting during the factorisation, which is the main source of parallelism loss in sparse LU.

  • As the match step of the canonical recipe match → reorder → factorise — see Reordering and permutations.

For SPD systems factorised by Cholesky, MC64 is unnecessary — Cholesky does not pivot. For Krylov iterative methods, MC64’s scaling is occasionally useful as a diagonal preconditioner, but the matching step adds value only when the matrix has zero or near-zero diagonal entries.

Using MC64#

Mc64 returns a gko::matrix::ScaledPermutation — a permutation composed with the row scaling, so a single apply permutes and scales in one pass.

auto mc64_factory =
    gko::experimental::reorder::Mc64<double, int>::build()
        .with_strategy(gko::experimental::reorder::mc64_strategy::max_diagonal_product)
        .on(exec);
auto scaled_perm = mc64_factory->generate(system_matrix);

The most common downstream pattern is to compose MC64 with a fill-reducing reordering and an inner solver via ScaledReordered, which hides the row/column transforms inside a single LinOp.

Implementation notes#

The algorithm runs on host. The matching is solved by an augmenting-path / Hungarian-style algorithm; expect runtime to be modest for matrices up to a few million nonzeros but to dominate the pre-factorisation budget on much larger problems.

See also

Reference: Duff, I. S., Koster, J. The Design and Use of Algorithms for Permuting Large Entries to the Diagonal of Sparse Matrices. SIAM Journal on Matrix Analysis and Applications, 20 (4), 889–901, 1999. https://doi.org/10.1137/S0895479897317661