gko::experimental::distributed::
RowGatherer#

Communication primitive that fetches remote rows of a distributed vector — the workhorse behind the off-diagonal SpMV in distributed::Matrix. Caches the send and receive index lists so repeated halo exchanges with the same partition reuse the same schedule and only the payload moves.

template<typename LocalIndexType = int32>
class RowGatherer #

Inherits from

The distributed::RowGatherer gathers the rows of distributed::Vector that are located on other processes.

Example usage:

auto coll_comm = std::make_shared<mpi::neighborhood_communicator>(comm,
                                                                  imap);
auto rg = distributed::RowGatherer<int32>::create(exec, coll_comm, imap);

auto b = distributed::Vector<double>::create(...);
auto x = matrix::Dense<double>::create(...);

auto req = rg->apply_async(b, x);
// users can do some computation that doesn't modify b, or access x
req.wait();
// x now contains the gathered rows of b

Note

The output vector for the apply_async functions must use an executor that is compatible with the MPI implementation. In particular, if the MPI implementation is not GPU aware, then the output vector must use a CPU executor. Otherwise, an exception will be thrown.

Template Parameters:

LocalIndexType – the index type for the stored indices

Public Functions

mpi::request apply_async(
ptr_param<const LinOp> b,
ptr_param<LinOp> x,
) const#

Asynchronous version of LinOp::apply.

Warning

Only one mpi::request can be active at any given time. Calling this function again without waiting on the previous mpi::request will lead to undefined behavior.

Parameters:
  • b – the input distributed::Vector.

  • x – the output matrix::Dense with the rows gathered from b. Its executor has to be compatible with the MPI implementation, see the class documentation.

Returns:

a mpi::request for this task. The task is guaranteed to be completed only after .wait() has been called on it.

mpi::request apply_async(
ptr_param<const LinOp> b,
ptr_param<LinOp> x,
gko::detail::GenericDenseCache &workspace,
) const#

Asynchronous version of LinOp::apply.

Warning

Calling this multiple times with the same workspace and without waiting on each previous request will lead to incorrect data transfers.

Parameters:
  • b – the input distributed::Vector.

  • x – the output matrix::Dense with the rows gathered from b. Its executor has to be compatible with the MPI implementation, see the class documentation.

  • workspace – a workspace to store temporary data for the operation. This might not be modified before the request is waited on.

Returns:

a mpi::request for this task. The task is guaranteed to be completed only after .wait() has been called on it.

dim<2> get_size() const#

Returns the size of the row gatherer.

std::shared_ptr<const mpi::CollectiveCommunicator> get_collective_communicator(
) const#

Get the used collective communicator.

const LocalIndexType *get_const_send_idxs() const#

Read access to the (local) rows indices

size_type get_num_send_idxs() const#

Returns the number of (local) row indices.

Public Static Functions

template<typename GlobalIndexType = int64>
static inline std::unique_ptr<RowGatherer> create(
std::shared_ptr<const Executor> exec,
std::shared_ptr<const mpi::CollectiveCommunicator> coll_comm,
const index_map<LocalIndexType, GlobalIndexType> &imap,
)#

Creates a distributed::RowGatherer from a given collective communicator and index map.

@TODO: using a segmented array instead of the imap would probably be more general

Note

The coll_comm and imap have to be compatible. The coll_comm must send and recv exactly as many rows as the imap defines.

Note

This is a collective operation, all participating processes have to execute this operation.

Template Parameters:

GlobalIndexType – the global index type of the index map

Parameters:
  • exec – the executor

  • coll_comm – the collective communicator

  • imap – the index map defining which rows to gather

Returns:

a shared_ptr to the created distributed::RowGatherer