tf::cudaRoundRobinCapturing class

class to capture the described graph into a native cudaGraph using a greedy round-robin algorithm on a fixed number of streams

A round-robin capturing algorithm levelizes the user-described graph and assign streams to nodes in a round-robin order level by level.

Constructors, destructors, conversion operators

cudaRoundRobinCapturing()
constructs a round-robin optimizer with 4 streams by default
cudaRoundRobinCapturing(size_t num_streams)
constructs a round-robin optimizer with the given number of streams

Public functions

auto num_streams() const -> size_t
queries the number of streams used by the optimizer
void num_streams(size_t n)
sets the number of streams used by the optimizer