Use an existing MPI communicator#
When your application has already created an MPI_Comm — typically a
sub-communicator from MPI_Comm_split, a duplicated comm from
MPI_Comm_dup, or one inherited from a framework like PETSc — wrap
that handle in gko::experimental::mpi::communicator and Ginkgo will
use it for every distributed object you build with it.
The recipe#
#include <ginkgo/ginkgo.hpp>
#include <mpi.h>
// Your application's existing communicator.
MPI_Comm app_comm = /* whatever your application built */;
// Wrap it. The wrapper is non-owning — the caller still controls the lifetime
// of app_comm (i.e. when MPI_Comm_free is called).
auto gko_comm = gko::experimental::mpi::communicator{app_comm};
// Now every distributed type you build with gko_comm uses your handle.
auto partition = gko::experimental::distributed::Partition<int, long>::
build_from_global_size_uniform(exec, gko_comm.size(), global_n);
auto A = gko::experimental::distributed::Matrix<double, int, long>::create(
exec, gko_comm);
A->read_distributed(matrix_data, partition.get());
When to take ownership#
The default constructor is non-owning — Ginkgo will not call
MPI_Comm_free on the handle. If you want Ginkgo to manage the
lifetime (rare; typically applications track their own comms), use
the explicit factory:
auto owning = gko::experimental::mpi::communicator::create_owning(app_comm);
// MPI_Comm_free called when the last copy goes out of scope.
Splitting communicators#
The wrapper also exposes MPI_Comm_split directly — useful when you
want to subdivide ranks for a nested solver without dropping into
raw MPI:
auto world = gko::experimental::mpi::communicator{MPI_COMM_WORLD};
auto sub = gko::experimental::mpi::communicator{world,
/*color=*/rank / 4,
/*key=*/rank};
This produces sub-communicators with 4 ranks each. Use them like any
other gko::experimental::mpi::communicator.
Force host-buffer staging#
When your MPI build is not GPU-aware but you want to keep CUDA-aware Ginkgo running, set the second constructor argument:
auto gko_comm = gko::experimental::mpi::communicator{app_comm,
/*force_host_buffer=*/true};
Ginkgo will route every distributed exchange through a host staging buffer, sidestepping the assumption that MPI accepts device pointers.
Pair with the right device id#
On multi-GPU nodes, choose the device based on the rank within the communicator:
auto n_dev = gko::CudaExecutor::get_num_devices();
auto exec = gko::CudaExecutor::create(
gko::experimental::mpi::map_rank_to_device_id(app_comm, n_dev),
gko::OmpExecutor::create());
map_rank_to_device_id ensures ranks sharing a node get distinct
devices.
Common pitfalls#
MPI must be initialised first. Either call
MPI_Inityourself before constructing anycommunicator, or usegko::experimental::mpi::environment(Ginkgo’s RAII guard aroundMPI_Init/MPI_Finalize).One
MPI_Commper Ginkgo communicator wrapper. Copies of the wrapper share the sameMPI_Comm; freeing the handle while any copy survives is undefined behaviour. Stick with non-owning semantics unless you have a clear reason otherwise.Distributed Ginkgo requires
-DGINKGO_BUILD_MPI=ON. Without it thedistributed::namespace is not compiled. See Build options.
See also
MPI layer — the conceptual reference for the wrapper, request, and collective layer.
Assemble a distributed matrix — using the communicator to build a
distributed::Matrix.