Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am using CUDA 5.0 and a Compute Capability 2.1 card.

The question is quite straightforward: Can a kernel be part of a class? For example:

class Foo
{
private:
 //...
public:
 __global__ void kernel();
};

__global__ void Foo::kernel()
{
 //implementation here
}

If not then the solution is to make a wrapper function that is member of the class and calls the kernel internally?

And if yes, then will it have access to the private attributes as a normal private function?

(I'm not just trying it and see what happens because my project has several other errors right now and also I think it's a good reference question. It was difficult for me to find reference for using CUDA with C++. Basic functionality examples can be found but not strategies for structured code.)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
460 views
Welcome To Ask or Share your Answers For Others

1 Answer

Let me leave cuda dynamic parallelism out of the discussion for the moment (i.e. assume compute capability 3.0 or prior).

remember __ global__ is used for cuda functions that will (only) be called from the host (but execute on the device). If you instantiate this object on the device, it won't work. Furthermore, to get device-accessible private data to be available to the member function, the object would have to be instantiated on the device.

So you could have a kernel invocation (ie. mykernel<<<blocks,threads>>>(...); embedded in a host object member function, but the kernel definition (i.e. the function definition with the __ global__ decorator) would normally precede the object definition in your source code. And as stated already, such a methodology could not be used for an object instantiated on the device. It would also not have access to ordinary private data defined elsewhere in the object. (It may be possible to come up with a scheme for a host-only object that does create device data, using pointers in global memory, that would then be accessible on the device, but such a scheme seems quite convoluted to me at first glance).

Normally, device-usable member functions would be preceded by the __ device__ decorator. In this case, all the code in the device member function executes from within the thread that called it.

This question gives an example (in my edited answer) of a C++ object with a member function callable from both the host and the device, with appropriate data copying between host and device objects.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...