gpu | 易学教程

How to directly access a GPU?

阅读更多关于 How to directly access a GPU?

问题 As most of you know CPUs are not well designed to do floating point calculation in contrast to GPUs. I am wondering how to use GPU's power without any abstraction layer or driver. Can I program for a GPU using assembly, C, C++ language (I mean how?). Although assembly seems to help me access the gpu directly, C/C++ are likely to need a medium library (e.g. OpenCL) to access the GPU. Let me ask you another question: How much of a modern GPU's capability will be exposed to a programmer without

GPU Render onto sphere

阅读更多关于 GPU Render onto sphere

问题 I am trying to write an optimized code that renders a 3D scene using OpenGL onto a sphere and then displays the unwrapped sphere on the screen ie producing a planar map of a purely reflective sphere. In math terms, I would like to produce a projection map where the x axis is the polar angle and y axis is the azimuth. I am trying to do this by placing the camera at the center of the sphere probe and taking planar shots around so as to approximate spherical quads with planar tiles of the

Multiple GPUs with Cuda Thrust?

阅读更多关于 Multiple GPUs with Cuda Thrust?

问题 How do I use Thrust with multiple GPUs? Is it simply a matter of using cudaSetDevice(deviceId) and then running the relevant Thrust code? 回答1: With CUDA 4.0 or later, cudaSetDevice(deviceId) followed by your thrust code should work. Just keep in mind that you will need to create and operate on separate vectors on each device (unless you have devices that support peer-to-peer memory access and PCI-express bandwidth is sufficient for your task). 来源： https://stackoverflow.com/questions/8289860

Vector, proxy class and dot operator in C++

阅读更多关于 Vector, proxy class and dot operator in C++

问题 A question related to a custom Vector class in C++. template <typename T> class Vector { ... private: T * mData; int mSize; public: proxy_element operator[](const size_type index) { return proxy_element(*this, index); } const T& operator[](const size_type index) const { return mData[index]; } }; template <typename T> class proxy_element { ... proxy_element(Vector<T>& m_parent, const size_type index); proxy_elem& operator=(const T& rhs); // modifies data so invalidate on other memories bool

NV_path_rendering alternative [closed]

阅读更多关于 NV_path_rendering alternative [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . I just watched a very impressive presentation from Siggraph 2012: http://nvidia.fullviewmedia.com/siggraph2012/ondemand/SS106.html My question is, this being a proprietary Nvidia extension, what are the other possibilities to quickly renderer Bezier paths on GPU? Alternatively, is there any hope this will end-up

Programmatically fetch GPU utilization

阅读更多关于 Programmatically fetch GPU utilization

问题 Is there a standard way of getting the current load on the GPU? I'm looking for something similar to the Task Manager showing CPU%. Utilities such as GPU-Z show this value but I'm not sure how it gets this. I'm specifically interested in AMD graphics cards at the moment, any pointers would be helpful. If there's no clean API way of doing it, are there any programs whose output I can capture to get this info? 回答1: For AMD/ATI cards, check out GPU PerfStudio. http://developer.amd.com/gpu/Pages

Is it possible to achieve Huffman decoding in GPU?

阅读更多关于 Is it possible to achieve Huffman decoding in GPU?

问题 We have a database encoded with Huffman coding. The aim here is to copy on the GPU it with its associated decoder; then on the GPU, decod the database and do stuff on this decoded database without copying back it on the CPU. I am far to be a Huffman specialist, but the few I know shows that it seems to be an algorithm essentially based on control structures. With the basic algorithm, I am afraid that there will be a lot of serialized operations. My 2 questions are: do you know if there exists

Structuring a Keras project to achieve reproducible results in GPU

阅读更多关于 Structuring a Keras project to achieve reproducible results in GPU

问题 I am writing a tensorflow.Keras wrapper to perform ML experiments. I need my framework to be able to perform an experiment as specified in a configuration yaml file and run in parallel in a GPU. Then I need a guarantee that if I ran the experiment again I would get if not the exact same results something reasonably close. To try to ensure this, my training script contains these lines at the beginning, following the guidelines in the official documentation: # Set up random seeds random.seed

Creating a copy of the buffer pointed by host ptr on the GPU from GPU kernel in OpenCL

阅读更多关于 Creating a copy of the buffer pointed by host ptr on the GPU from GPU kernel in OpenCL

问题 I was trying to understand how exactly CL_MEM_USE_HOST_PTR and CL_MEM_COPY_HOST_PTR work. Basically when using CL_MEM_USE_HOST_PTR, say in creating a 2D image, this will copy nothing to the device, instead the GPU will refer the mapped memory(clEnqueueMapBuffer maps it) on the host, do the processing and we can write the results to some other location. On the other hand if I use the CL_MEM_COPY_HOST_PTR, it will create a copy of the data pointed to by host ptr on the device(I guess it will

How to get compute unit ID at runtime in OpenCL?

阅读更多关于 How to get compute unit ID at runtime in OpenCL?

问题 Is there a way to get the compute unit ID a work group is running on during runtime? I know that CUDA has some assembly code to do this. 回答1: No, there isn't a way to get the compute unit's ID. Your code should use the work group ID instead. What are you trying to achieve? I am a little surprised that CUDA supports this, please tell me which assembly code instruction does this. 来源： https://stackoverflow.com/questions/19547197/how-to-get-compute-unit-id-at-runtime-in-opencl