template<typename H, typename C, typename D>
tf::cudaPerThreadDeviceObjectPool class

per-thread object pool to manage CUDA device object

Template parameters
H object type
C function object to create a library object
D function object to delete a library object

A CUDA device object has a lifetime associated with a device, for example, cudaStream_t, cublasHandle_t, etc. Creating a device object is typically expensive (e.g., 10-200 ms) and destroying it may trigger implicit device synchronization. For applications tha intensively make use of device objects, it is desirable to reuse them as much as possible.

There exists an one-to-one relationship between CUDA devices in CUDA Runtime API and CUcontexts in the CUDA Driver API within a process. The specific context which the CUDA Runtime API uses for a device is called the device's primary context. From the perspective of the CUDA Runtime API, a device and its primary context are synonymous.

We design the device object pool in a decentralized fashion by keeping (1) a global pool to keep track of potentially usable objects and (2) a per-thread pool to footprint objects with shared ownership. The global pool does not own the object and therefore does not destruct any of them. The per-thread pool keeps the footprints of objects with shared ownership and will destruct them if the thread holds the last reference count after it joins. The motivation of this decentralized control is to avoid device objects from being destroyed while the context had been destroyed due to driver shutdown.

Public types

struct Object
structure to store a context object

Constructors, destructors, conversion operators

cudaPerThreadDeviceObjectPool() defaulted
default constructor

Public functions

auto acquire(int) -> std::shared_ptr<Object>
acquires a device object with shared ownership
void release(std::shared_ptr<Object>&&)
releases a device object with moved ownership
auto footprint_size() const -> size_t
queries the number of device objects with shared ownership