gko::log::ProfilerHook#

Emits scoped profiler ranges (NVTX on CUDA, ROCTX on HIP, VTune ITT, TAU, or a user-supplied callback) so external profilers can attribute GPU and CPU time to named Ginkgo operations rather than anonymous kernel calls. Also includes a built-in summary writer that prints a flat or nested timing report.

class ProfilerHook #

Inherits from

This Logger can be used to annotate the execution of Ginkgo functionality with profiler-specific ranges. It currently supports TAU, VTune, NSightSystems (NVTX) and rocPROF(ROCTX) and custom profiler hooks.

The Logger should be attached to the Executor that is being used to run the application for a full, program-wide annotation, or to individual objects to only highlight events caused directly by them (not operations and memory allocations though)

Public Functions

void set_object_name(
ptr_param<const PolymorphicObject> obj,
std::string name,
)#

Sets the name for an object to be profiled. Every instance of that object in the profile will be replaced by the name instead of its runtime type.

Parameters:
  • obj – the object

  • name – its name

void set_synchronization(bool synchronize)#

Should the events call executor->synchronize on operations and copy/allocation? This leads to a certain overhead, but makes the execution timeline of kernels synchronous.

profiling_scope_guard user_range(const char *name) const#

Creates a scope guard for a user-defined range to be included in the profile.

Parameters:

name – the name of the range

Returns:

the scope guard. It will begin a range immediately and end it at the end of its scope.

Public Static Functions

static std::shared_ptr<ProfilerHook> create_tau(
bool initialize = true,
)#

Creates a logger annotating Ginkgo events with TAU ranges via PerfStubs.

Parameters:

initialize – Should we call TAU’s initialization and finalization functions, or does the application take care of it? The initialization will happen immediately, the finalization at program exit.

static std::shared_ptr<ProfilerHook> create_vtune()#

Creates a logger annotating Ginkgo events with VTune ITT ranges.

static std::shared_ptr<ProfilerHook> create_nvtx(
uint32 color_argb = color_yellow_argb,
)#

Creates a logger annotating Ginkgo events with NVTX ranges for CUDA.

Parameters:

color_argb – The color of the NVTX ranges in the NSight Systems output. It has to be a 32 bit packed ARGB value.

static std::shared_ptr<ProfilerHook> create_roctx()#

Creates a logger annotating Ginkgo events with ROCTX ranges for HIP.

static std::shared_ptr<ProfilerHook> create_for_executor(
std::shared_ptr<const Executor> exec,
)#

Creates a logger annotating Ginkgo events with the most suitable backend for the given executor: NVTX for NSight Systems in CUDA, ROCTX for rocprof in HIP, TAU for everything else.

static std::shared_ptr<ProfilerHook> create_summary(
std::shared_ptr<Timer> timer = std::make_shared<CpuTimer>(),
std::unique_ptr<SummaryWriter> writer = std::make_unique<TableSummaryWriter>(),
bool debug_check_nesting = false,
)#

Creates a logger measuring the runtime of Ginkgo events and printing a summary when it is destroyed.

Note

For this logger to provide reliable GPU timings, either use Timer::create_for_executor or enable synchronization via set_synchronization(true).

Parameters:
  • timer – The timer used to record time points.

  • writer – The SummaryWriter to receive the performance results.

  • debug_check_nesting – Enable this flag if the output looks like it might contain incorrect nesting. This increases the overhead slightly, but recognizes mismatching push/pop pairs on the range stack.

static std::shared_ptr<ProfilerHook> create_nested_summary(
std::shared_ptr<Timer> timer = std::make_shared<CpuTimer>(),
std::unique_ptr<NestedSummaryWriter> writer = std::make_unique<TableSummaryWriter>(),
bool debug_check_nesting = false,
)#

Creates a logger measuring the runtime of Ginkgo events in a nested fashion and printing a summary when it is destroyed.

Note

For this logger to provide reliable GPU timings, either use Timer::create_for_executor or enable synchronization via set_synchronization(true).

Parameters:
  • timer – The timer used to record time points.

  • writer – The NestedSummaryWriter to receive the performance results.

  • debug_check_nesting – Enable this flag if the output looks like it might contain incorrect nesting. This increases the overhead slightly, but recognizes mismatching push/pop pairs on the range stack.

static std::shared_ptr<ProfilerHook> create_custom(
hook_function begin,
hook_function end,
)#

Creates a logger annotating Ginkgo events with a custom set of functions for range begin and end.

Public Static Attributes

static uint32 color_yellow_argb = 0xFFFFCB05U#

The Ginkgo yellow background color as packed 32 bit ARGB value.

struct summary_entry#

Public Members

std::string name#

The name of the range.

std::chrono::nanoseconds inclusive = {0}#

The total runtime of all invocations of the range in nanoseconds.

std::chrono::nanoseconds exclusive = {0}#

The total runtime of all invocations of the range in nanoseconds, excluding the runtime of all nested ranges.

int64 count = {}#

The total number of invocations of the range.

struct nested_summary_entry#

Public Members

std::string name#

The name of the range.

std::chrono::nanoseconds elapsed = {0}#

The total runtime of all invocations of the range in nanoseconds.

int64 count = {}#

The total number of invocations of the range.

std::vector<nested_summary_entry> children = {}#

The nested ranges inside this range.

class SummaryWriter#

Receives the results from ProfilerHook::create_summary().

Subclassed by

Public Functions

virtual void write(
const std::vector<summary_entry> &entries,
std::chrono::nanoseconds overhead,
) = 0#

Callback to write out the summary results.

Parameters:
  • entries – the vector of ranges with runtime and count.

  • overhead – an estimate of the profiler overhead

class NestedSummaryWriter#

Receives the results from ProfilerHook::create_nested_summary().

Subclassed by

Public Functions

virtual void write_nested(
const nested_summary_entry &root,
std::chrono::nanoseconds overhead,
) = 0#

Callback to write out the summary results.

Parameters:
  • root – the root range with runtime and count.

  • overhead – an estimate of the profiler overhead

class TableSummaryWriter #

Inherits from

Writes the results from ProfilerHook::create_summary() and ProfilerHook::create_nested_summary() to a ASCII table in Markdown format.

Public Functions

TableSummaryWriter(
std::ostream &output = std::cerr,
std::string header = "Runtime summary",
)#

Constructs a writer on an output stream.

Parameters:
  • output – the output stream to write the table to.

  • header – the header to write above the table.

virtual void write(
const std::vector<summary_entry> &entries,
std::chrono::nanoseconds overhead,
) override#

Callback to write out the summary results.

Parameters:
  • entries – the vector of ranges with runtime and count.

  • overhead – an estimate of the profiler overhead

virtual void write_nested(
const nested_summary_entry &root,
std::chrono::nanoseconds overhead,
) override#

Callback to write out the summary results.

Parameters:
  • root – the root range with runtime and count.

  • overhead – an estimate of the profiler overhead