Is there a better/cleaner/more elegant way to malloc and free in cuda?

问题

I am trying to cudaMalloc a bunch of device pointers, and gracefully exit if any of the mallocs didn't work. I have functioning code - but bloated because I have to cudaFree everything I'd previously malloc'd if one fails. So now I am wondering if there is a more succinct method of accomplishing this. Obviously I can't free something that hasn't been malloc'd - that will definitely cause problems.

Below is the snippet of code I am trying to make more elegant.

    //define device pointers
    float d_norm, *d_dut, *d_stdt, *d_gamma, *d_zeta;

    //allocate space on the device for the vectors and answer
    if (cudaMalloc(&d_norm, sizeof(float)*vSize) != cudaSuccess) {
            std::cout << "failed malloc";
            return;
    };

    if (cudaMalloc(&d_data, sizeof(float)*vSize) != cudaSuccess) {
            std::cout << "failed malloc";
            cudaFree(d_norm);
            return;
    };

    if (cudaMalloc(&d_stdt, sizeof(float)*wSize) != cudaSuccess) {
            std::cout << "failed malloc";
            cudaFree(d_norm);
            cudaFree(d_data);
            return;
    };

    if (cudaMalloc(&d_gamma, sizeof(float)*vSize) != cudaSuccess) {
            std::cout << "failed malloc";
            cudaFree(d_norm);
            cudaFree(d_dut);
            cudaFree(d_stdt);
            return;
    };

    if (cudaMalloc(&d_zeta, sizeof(float)*w) != cudaSuccess) {
            std::cout << "failed malloc";
            cudaFree(d_norm);
            cudaFree(d_dut);
            cudaFree(d_stdt);
            cudaFree(d_gamma);
            return;
    };

This is a shortened version, but you can see how it just keeps building. In reality I am trying to malloc about 15 arrays. It starts getting ugly - but it works correctly.

Thoughts?

回答1:

Some possibilities:

cudaDeviceReset() will free all device allocations, without you having to run through a list of pointers.
if you intend to exit (the application), all device allocations are freed automatically upon application termination anyway. The cuda runtime detects the termination of the process associated with an application's device context, and wipes that context at that point. So if you're just going to exit, it should be safe to not perform any cudaFree() operations.

回答2:

You can wrap them into unique_ptr with custom deleter. (c++11)
Or just add to one vector when success allocate and free all pointers in the vector.

example about unique_ptr:

#include <iostream>
#include <memory>
using namespace std;

void nativeFree(float* p);
float* nativeAlloc(float value);

class NativePointerDeleter{
public:
   void operator()(float* p)const{nativeFree(p);}
};


int main(){
   using pointer_type = unique_ptr<float,decltype(&nativeFree)>;
   using pointer_type_2 = unique_ptr<float,NativePointerDeleter>;

   pointer_type ptr(nativeAlloc(1),nativeFree);
   if(!ptr)return 0;

   pointer_type_2 ptr2(nativeAlloc(2));//no need to provide deleter
   if(!ptr2)return 0;

   pointer_type ptr3(nullptr,nativeFree);//simulate a fail alloc
   if(!ptr3)return 0;

   /*Do Some Work*/

   //now one can return without care about all the pointers
   return 0;
}

void nativeFree(float* p){
   cout << "release " << *p << '\n';
   delete p;
}
float* nativeAlloc(float value){
   return new float(value);
}

回答3:

Initially store nullptr in all pointers. free takes no effect on a null pointer.

int* p1 = nullptr;
int* p2 = nullptr;
int* p3 = nullptr;

if (!(p1 = allocate()))
  goto EXIT_BLOCK;
if (!(p2 = allocate()))
  goto EXIT_BLOCK;
if (!(p3 = allocate()))
  goto EXIT_BLOCK;

EXIT_BLOCK:
free(p3); free(p2); free(p1);

回答4:

_{Question is tagged C++, so here is a C++ solution}

The general practice is to acquire resources in constructor and to release in destructor. The idea is that in any circumstances resource is guaranteed to be released by a call to destructor. Neat side effect is that destructor is called automagically in the end of the scope, so you don't need to do anything at all for resource to be released when it's no longer used. See RAII

In the role of resource one might have various memory types, file handles, sockets etc. CUDA device memory is no exception from this general rule.

I would also discourage you from writing your own resource owning classes and would advice to use a library. thrust::device_vector is probably the most widely used device memory container. Thrust library is a part of CUDA toolkit.

回答5:

Yes. If you use (my) CUDA Modern-C++ API wrapper library, you could just use unique pointers, which will release when their lifetime ends. Your code will become merely the following:

auto current_device = cuda::device::current::get();
auto d_dut   = cuda::memory::device::make_unique<float[]>(current_device, vSize);
auto d_stdt  = cuda::memory::device::make_unique<float[]>(current_device, vSize);
auto d_gamma = cuda::memory::device::make_unique<float[]>(current_device, vSize);
auto d_zeta  = cuda::memory::device::make_unique<float[]>(current_device, vSize);

Note, though, that you could just allocate once and just place the other pointers at an appropriate offset.

来源：https://stackoverflow.com/questions/39395508/is-there-a-better-cleaner-more-elegant-way-to-malloc-and-free-in-cuda

标签

c++

cuda

malloc

free