gko::OmpExecutor#
Multi-threaded host executor backed by OpenMP. Kernels fan out across
the OpenMP thread pool; thread count, scheduling, and affinity are
controlled by the usual OpenMP environment variables (OMP_NUM_THREADS,
OMP_PROC_BIND, OMP_PLACES). Memory comes from the system allocator —
on NUMA hosts, first-touch placement determines the locality, so
allocate close to where you’ll use the data.
The natural CPU production executor. Pair with hardware-locality–aware matrix orderings (e.g. RCM) and a block-Jacobi or IC/ILU preconditioner for common iterative-solver workloads.
-
class OmpExecutor #
Inherits from
public gko::detail::ExecutorBase<OmpExecutor>
public std::enable_shared_from_this<OmpExecutor>
This is the Executor subclass which represents the OpenMP device (typically CPU).
Subclassed by
Public Functions
-
virtual std::shared_ptr<Executor> get_master() noexcept override#
Returns the master OmpExecutor of this Executor.
- Returns:
the master OmpExecutor of this Executor.
- virtual std::shared_ptr<const Executor> get_master(
Returns the master OmpExecutor of this Executor.
- Returns:
the master OmpExecutor of this Executor.
-
virtual void synchronize() const override#
Synchronize the operations launched on the executor with its master.
-
virtual std::string get_description() const override#
- Returns:
a textual representation of the executor and its device.
-
virtual void run(const Operation &op) const = 0#
Runs the specified Operation using this Executor.
- Parameters:
op – the operation to run
-
template<typename ClosureOmp, typename ClosureCuda, typename ClosureHip, typename ClosureDpcpp>
inline void run( - const ClosureOmp &op_omp,
- const ClosureCuda &op_cuda,
- const ClosureHip &op_hip,
- const ClosureDpcpp &op_dpcpp,
Runs one of the passed in functors, depending on the Executor type.
- Template Parameters:
ClosureOmp – type of op_omp
ClosureCuda – type of op_cuda
ClosureHip – type of op_hip
ClosureDpcpp – type of op_dpcpp
- Parameters:
op_omp – functor to run in case of a OmpExecutor or ReferenceExecutor
op_cuda – functor to run in case of a CudaExecutor
op_hip – functor to run in case of a HipExecutor
op_dpcpp – functor to run in case of a DpcppExecutor
-
template<typename ClosureReference, typename ClosureOmp, typename ClosureCuda, typename ClosureHip, typename ClosureDpcpp>
inline void run( - std::string name,
- const ClosureReference &op_ref,
- const ClosureOmp &op_omp,
- const ClosureCuda &op_cuda,
- const ClosureHip &op_hip,
- const ClosureDpcpp &op_dpcpp,
Runs one of the passed in functors, depending on the Executor type.
- Template Parameters:
ClosureReference – type of op_ref
ClosureOmp – type of op_omp
ClosureCuda – type of op_cuda
ClosureHip – type of op_hip
ClosureDpcpp – type of op_dpcpp
- Parameters:
name – the name of the operation
op_ref – functor to run in case of a ReferenceExecutor
op_omp – functor to run in case of a OmpExecutor
op_cuda – functor to run in case of a CudaExecutor
op_hip – functor to run in case of a HipExecutor
op_dpcpp – functor to run in case of a DpcppExecutor
Public Static Functions
- std::shared_ptr<CpuAllocatorBase> alloc = std::make_shared<CpuAllocator>(),
Creates a new OmpExecutor.