MC64 (matching and scaling)#

MC64 is a matching-and-scaling preprocessing step for sparse direct factorisation. Unlike AMD or nested dissection — which permute rows and columns to reduce fill-in but leave numerical values untouched — MC64 picks a row permutation that maximises a metric over the resulting diagonal absolute values and simultaneously emits a pair of diagonal scaling matrices \(D_r, D_c\) that bring every diagonal entry of the transformed matrix to unit magnitude.

Phrased as an optimisation, MC64 solves a bipartite-matching problem on the matrix graph. The two scoring strategies Ginkgo supports correspond to different per-edge weight functions \(w_{ij}\), and in both cases the matching minimises the reduced cost \(m_i - w_{i,\sigma(i)}\), where \(m_i = \max_j w_{ij}\) is the row-maximum:

Strategy	Per-edge weight \(w_{ij}\)	Matching objective	Equivalent diagonal goal
`max_diagonal_product` (default)	\(w_{ij} = \log_2 \lvert a_{ij} \rvert\)	\(\min_{\sigma} \sum_{i} \bigl( m_i - w_{i,\sigma(i)} \bigr)\)	maximise \(\prod_{i} \lvert a_{i,\sigma(i)} \rvert\)
`max_diagonal_sum`	\(w_{ij} = \lvert a_{ij} \rvert\)	\(\min_{\sigma} \sum_{i} \bigl( m_i - w_{i,\sigma(i)} \bigr)\)	maximise \(\sum_{i} \lvert a_{i,\sigma(i)} \rvert\)

Subtracting the row maximum before matching keeps every cost non-negative — required by the shortest-augmenting-path matching the implementation uses. The log-space transform on the product strategy also avoids the underflow that would otherwise plague a literal \(\prod \lvert a_{ij} \rvert\) over thousands of subunit values.

After matching, MC64 chooses \(D_r\) and \(D_c\) so the permuted-and-scaled matrix \(\hat A = D_r\, P A\, D_c\) has \(\lvert \hat a_{ii} \rvert = 1\) and \(\lvert \hat a_{ij} \rvert \le 1\).

When to use MC64#

Indefinite or strongly non-symmetric matrices headed for LU factorisation. Pushing large entries onto the diagonal reduces pivoting during the factorisation, which is the main source of parallelism loss in sparse LU.
As the match step of the canonical recipe match → reorder → factorise — see Reordering and permutations.

For SPD systems factorised by Cholesky, MC64 is unnecessary — Cholesky does not pivot. For Krylov iterative methods, MC64’s scaling is occasionally useful as a diagonal preconditioner, but the matching step adds value only when the matrix has zero or near-zero diagonal entries.

Using MC64#

Mc64 returns a gko::matrix::ScaledPermutation — a permutation composed with the row scaling, so a single apply permutes and scales in one pass.

auto mc64_factory =
    gko::experimental::reorder::Mc64<double, int>::build()
        .with_strategy(gko::experimental::reorder::mc64_strategy::max_diagonal_product)
        .on(exec);
auto scaled_perm = mc64_factory->generate(system_matrix);

The most common downstream pattern is to compose MC64 with a fill-reducing reordering and an inner solver via ScaledReordered, which hides the row/column transforms inside a single LinOp.

Implementation notes#

The algorithm runs on host. The matching is solved by an augmenting-path / Hungarian-style algorithm; expect runtime to be modest for matrices up to a few million nonzeros but to dominate the pre-factorisation budget on much larger problems.