HIP executor#

gko::HipExecutor runs Ginkgo kernels on a single AMD GPU using ROCm/HIP.

Construction#

auto host = gko::OmpExecutor::create();
auto gpu  = gko::HipExecutor::create(0, host);   // device id 0

For custom allocator + stream:

hipStream_t stream;
hipStreamCreate(&stream);
auto alloc = std::make_shared<gko::HipAsyncAllocator>(stream);
auto gpu   = gko::HipExecutor::create(0, host, alloc, stream);

Allocators#

The HIP allocator family mirrors the CUDA family:

HipAllocator — hipMalloc (default).
HipAsyncAllocator — hipMallocAsync (stream-ordered).
HipUnifiedAllocator — hipMallocManaged (unified shared memory).
HipHostAllocator — hipHostAlloc (pinned host memory).

See Allocators for guidance on picking one. If you don’t pass an allocator argument, HipExecutor::create(id, host) constructs a HipAllocator (synchronous hipMalloc).

Streams#

By default HipExecutor uses the default HIP stream. The stream parameter type is GKO_HIP_STREAM_STRUCT* — this is the same type HIP defines as hipStream_t. Ginkgo does not wrap streams in its own type; pass whatever you got from hipStreamCreate(). Use an explicit stream for overlap or multi-stream pipelines.

Memory model#

Device memory by default; not host-accessible unless a unified or pinned-host allocator is used. With those, hip_exec->memory_accessible(host) returns true and cross-executor copies are elided.

Discovering devices#

auto count = gko::HipExecutor::get_num_devices();