I\'m starting to use CUDA at the moment and have to admit that I\'m a bit disappointed with the C API. I understand the reasons for choosing C but had the language been base
There are several projects that attempt something similar, for example CUDPP.
In the meantime, however, I've implemented my own allocator and it works well and was straightforward (> 95% boilerplate code).
In the meantime there were some further developments (not so much in terms of the CUDA API, but at least in terms of projects attempting an STL-like approach to CUDA data management).
Most notably there is a project from NVIDIA research: thrust
I would go with the placement new approach. Then I would define a class that conforms to the std::allocator<> interface. In theory, you could pass this class as a template parameter into std::vector<> and std::map<> and so forth.
Beware, I have heard that doing such things is fraught with difficulty, but at least you will learn a lot more about the STL this way. And you do not need to re-invent your containers and algorithms.
Does anybody have information about future CUDA developments that go in this general direction (let's face it: C interfaces in C++ s*ck)?
Yes, I've done something like that:
https://github.com/eyalroz/cuda-api-wrappers/
nVIDIA's Runtime API for CUDA is intended for use both in C and C++ code. As such, it uses a C-style API, the lower common denominator (with a few notable exceptions of templated function overloads).
This library of wrappers around the Runtime API is intended to allow us to embrace many of the features of C++ (including some C++11) for using the runtime API - but without reducing expressivity or increasing the level of abstraction (as in, e.g., the Thrust library). Using cuda-api-wrappers, you still have your devices, streams, events and so on - but they will be more convenient to work with in more C++-idiomatic ways.