`gko::CudaExecutor`#

NVIDIA-GPU executor. Each instance binds to a specific device id, a host master executor used for staged transfers, and a CUDA allocator (synchronous cudaMalloc by default; async, unified, or pinned-host variants are available — see gko::CudaAllocator and friends). An optional cudaStream_t can be passed so multiple CudaExecutors can share a device while running on independent streams.

class CudaExecutor #

Inherits from

public gko::detail::ExecutorBase<CudaExecutor>
public std::enable_shared_from_this<CudaExecutor>
public gko::detail::EnableDeviceReset

This is the Executor subclass which represents the CUDA device.

Public Functions

virtual std::shared_ptr<Executor> get_master() noexcept override#

Returns the master OmpExecutor of this Executor.

Returns:: the master OmpExecutor of this Executor.

virtual std::shared_ptr<const Executor> get_master( ) const noexcept override#

Returns the master OmpExecutor of this Executor.

Returns:: the master OmpExecutor of this Executor.

virtual void synchronize() const override#: Synchronize the operations launched on the executor with its master.

virtual std::string get_description() const override#

Returns:: a textual representation of the executor and its device.

inline int get_device_id() const noexcept#: Get the CUDA device id of the device associated to this executor.

inline int get_num_warps_per_sm() const noexcept#: Get the number of warps per SM of this executor.

inline int get_num_multiprocessor() const noexcept#: Get the number of multiprocessor of this executor.

inline int get_num_warps() const noexcept#: Get the number of warps of this executor.

inline int get_warp_size() const noexcept#: Get the warp size of this executor.

inline int get_major_version() const noexcept#: Get the major version of compute capability.

inline int get_minor_version() const noexcept#: Get the minor version of compute capability.

inline int get_compute_capability() const noexcept#: Get the compute capability

inline cublasContext *get_cublas_handle() const#

Get the cublas handle for this executor

Returns:: the cublas handle (cublasContext*) for this executor

inline cublasContext *get_blas_handle() const#

Get the cublas handle for this executor

Returns:: the cublas handle (cublasContext*) for this executor

inline cusparseContext *get_cusparse_handle() const#

Get the cusparse handle for this executor

Returns:: the cusparse handle (cusparseContext*) for this executor

inline cusparseContext *get_sparselib_handle() const#

Get the cusparse handle for this executor

Returns:: the cusparse handle (cusparseContext*) for this executor

inline std::vector<int> get_closest_pus() const#

Get the closest PUs

Returns:: the array of PUs closest to this device

inline int get_closest_numa() const#

Get the closest NUMA node

Returns:: the closest NUMA node closest to this device

inline CUstream_st *get_stream() const#

Returns the CUDA stream used by this executor. Can be nullptr for the default stream.

Returns:: the stream used to execute kernels and memory operations.

virtual void run(const Operation &op) const = 0#

Runs the specified Operation using this Executor.

Parameters:: op – the operation to run

template<typename ClosureOmp, typename ClosureCuda, typename ClosureHip, typename ClosureDpcpp> inline void run( const ClosureOmp &op_omp, const ClosureCuda &op_cuda, const ClosureHip &op_hip, const ClosureDpcpp &op_dpcpp, ) const#

Runs one of the passed in functors, depending on the Executor type.

Template Parameters:

ClosureOmp – type of op_omp
ClosureCuda – type of op_cuda
ClosureHip – type of op_hip
ClosureDpcpp – type of op_dpcpp

Parameters:

op_omp – functor to run in case of a OmpExecutor or ReferenceExecutor
op_cuda – functor to run in case of a CudaExecutor
op_hip – functor to run in case of a HipExecutor
op_dpcpp – functor to run in case of a DpcppExecutor

template<typename ClosureReference, typename ClosureOmp, typename ClosureCuda, typename ClosureHip, typename ClosureDpcpp> inline void run( std::string name, const ClosureReference &op_ref, const ClosureOmp &op_omp, const ClosureCuda &op_cuda, const ClosureHip &op_hip, const ClosureDpcpp &op_dpcpp, ) const#

Runs one of the passed in functors, depending on the Executor type.

Template Parameters:

ClosureReference – type of op_ref
ClosureOmp – type of op_omp
ClosureCuda – type of op_cuda
ClosureHip – type of op_hip
ClosureDpcpp – type of op_dpcpp

Parameters:

name – the name of the operation
op_ref – functor to run in case of a ReferenceExecutor
op_omp – functor to run in case of a OmpExecutor
op_cuda – functor to run in case of a CudaExecutor
op_hip – functor to run in case of a HipExecutor
op_dpcpp – functor to run in case of a DpcppExecutor

Public Static Functions

static std::shared_ptr<CudaExecutor> create( int device_id, std::shared_ptr<Executor> master, bool device_reset, allocation_mode alloc_mode = default_cuda_alloc_mode, CUstream_st *stream = nullptr, )#

Creates a new CudaExecutor.

Parameters:

device_id – the CUDA device id of this device
master – an executor on the host that is used to invoke the device kernels
device_reset – this option no longer has any effect.
alloc_mode – the allocation mode that the executor should operate on. See @allocation_mode for more details
stream – the stream to execute operations on.

static std::shared_ptr<CudaExecutor> create( int device_id, std::shared_ptr<Executor> master, std::shared_ptr<CudaAllocatorBase> alloc = std::make_shared<CudaAllocator>(), CUstream_st *stream = nullptr, )#

Creates a new CudaExecutor with a custom allocator and device stream.

Parameters:

device_id – the CUDA device id of this device
master – an executor on the host that is used to invoke the device kernels.
alloc – the allocator to use for device memory allocations.
stream – the stream to execute operations on.

static int get_num_devices()#: Get the number of devices present on the system.

gko::CudaExecutor#

`gko::CudaExecutor`#