Is There Any Way To Copy vtable From Host To Device (CUDA & C++)

雨燕双飞 提交于 2021-02-05 08:41:53

问题


It seems that Cuda does not allow me to "pass an object of a class derived from virtual base classes to __global__ function", for some reason related to "virtual table" or "virtual pointer".

I wonder is there some way for me to setup the "virtual pointer" manually, so that I can use the polymorphism?


回答1:


Is There Any Way To Copy vtable From Host To Device

You wouldn't want to copy the vtable from host to device. The vtable on the host (i.e. in an object created on the host) has a set of host function pointers in the vtable. When you copy such an object to the device, the vtable doesn't get changed or "fixed up", and so you end up with an object on the device, whose vtable is full of host pointers.

If you then try and call one of those virtual functions (using the object on the device, from device code), bad things happen. The numerical function entry points listed in the vtable are addresses that don't make any sense in device code.

so that I can use the polymorphism

My recommendation for a way to use polymorphism in device code is to create the object on the device. This sets up the vtable with a set of device function pointers, rather than host function pointers, and questions such as this demonstrate that it works. To a first order approximation, if you have a way to create a set of polymorphic objects in host code, I don't know of any reason why you shouldn't be able to use a similar method in device code. The issue really has to do with interoperability - moving such objects between host and device - which is what the stated limitations in the programming guide are referring to.

I wonder is there some way for me to setup the "virtual pointer" manully

There might be. In the interest of sharing knowledge, I will outline a method. However, I don't know C++ well enough to say whether this is acceptable/legal. The only thing I can say is in my very limited testing, it appears to work. But I would assume it is not legal and so I do not recommend you use this method for anything other than experimentation. Even if we don't resolve whether or not it is legal, there is already a stated CUDA limitation (as indicated above) that you should not attempt to pass objects with virtual functions between host and device. So I offer it merely as an observation, which may be interesting for experimentation or research. I don't suggest it for production code.

The basic idea is outlined in this thread. It is predicated on the idea that an ordinary object-copy does not seem to copy the virtual function pointer table, which makes sense to me, but that the object as a whole does contain the table. Therefore if we use a method like this:

template<typename T>
__device__ void fixVirtualPointers(T *other) {
        T temp =  T(*other); // object-copy moves the "guts" of the object w/o changing vtable
        memcpy(other, &temp, sizeof(T)); // pointer copy seems to move vtable
}

it seems to be possible to take a given object, create a new "dummy" object of that type, and then "fix up" the vtable by doing a pointer-based copy of the object (considering the entire object size) rather than a "typical" object-copy. Use this at your own risk. This blog may also be interesting reading, although I can't vouch for the correctness of any statements there.

Beyond this, there are a variety of other suggestions here on the cuda tag, you may wish to review them.




回答2:


I would like to provide a different way to fix the vtable which does not rely on copying the vtable between objects. The idea is to use placement new on the device to let the compiler generate the appropriate vtable. However, this approach also violates the restrictions stated in the programming guide.

#include <cstdio>

struct A{
    __host__ __device__
    virtual void foo(){
        printf("A\n");
    }
};

struct B : public A{

    B(int i = 13) : data(i){}

    __host__ __device__
    virtual void foo() override{
        printf("B %d\n", data);
    }

    int data;
};

template<class T>
__global__
void fixKernel(T* ptr){
    T tmp(*ptr);

    new (ptr) T(tmp);
}

__global__
void useKernel(A* ptr){
    ptr->foo();
}


int main(){

    A a;
    a.foo();

    B b(7); 
    b.foo();

    A* ab = new B();

    ab->foo();

    A* d_a;
    cudaMalloc(&d_a, sizeof(A));
    cudaMemcpy(d_a, &a, sizeof(A), cudaMemcpyHostToDevice);

    B* d_b;
    cudaMalloc(&d_b, sizeof(B));
    cudaMemcpy(d_b, &b, sizeof(B), cudaMemcpyHostToDevice);

    fixKernel<<<1,1>>>(d_a);

    useKernel<<<1,1>>>(d_a);

    fixKernel<<<1,1>>>(d_b);

    useKernel<<<1,1>>>(d_b);

    cudaDeviceSynchronize();

    cudaFree(d_b);
    cudaFree(d_a);
    delete ab;
}


来源:https://stackoverflow.com/questions/63378891/is-there-any-way-to-copy-vtable-from-host-to-device-cuda-c

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!