gpu | 易学教程

Keep alpha-transparency of a video through HDMI

阅读更多关于 Keep alpha-transparency of a video through HDMI

问题 The scenario I'm dealing with is actually as follow: I need to get the screen generated by OpenGL and send it through HDMI to a FPGA component while keeping the alpha channel. But right now the data that is being sent through HDMI is only RGB (24bit without alpha channel) So i need a way to force sending the Alpha bits through this port somehow. See image: http://i.imgur.com/hhlcbb9.jpg One solution i could think of is to convert the screen buffer from RGBA mode to RGB while mixing the Alpha

Parallel Cosine similarity of two large files with each other

阅读更多关于 Parallel Cosine similarity of two large files with each other

问题 I have two files: A and B A has 400,000 lines each having 50 float values B has 40,000 lines having 50 float values. For every line in B, I need to find corresponding lines in A which have >90% similarity (cosine). For linear search and computation, the code takes ginormous computing time. (40-50 hours) Reaching out to the community for suggestions on how to fasten the process (link of blogs/resources such as AWS/Cloud to be used to achieve it). Have been stuck with this for quite a while!

CUDA Add Rows of a Matrix

阅读更多关于 CUDA Add Rows of a Matrix

问题 I'm trying to add the rows of a 4800x9600 matrix together, resulting in a matrix 1x9600. What I've done is split the 4800x9600 into 9,600 matrices of length 4800 each. I then perform a reduction on the 4800 elements. The trouble is, this is really slow... Anyone got any suggestions? Basically, I'm trying to implement MATLAB's sum(...) function. Here is the code which I've verified works fine, it's just it's really slow: void reduceRows(Matrix Dresult,Matrix DA) { //split DA into chunks Matrix

how to free device_vector<int>

阅读更多关于 how to free device_vector

问题 I allocated some space using thrust device vector as follows: thrust::device_vector<int> s(10000000000); How do i free this space explicitly and correctly ? 回答1: device_vector deallocates the storage associated when it goes out of scope, just like any standard c++ container. If you'd like to deallocate any Thrust vector 's storage manually during its lifetime, you can do so using the following recipe: // empty the vector vec.clear(); // deallocate any capacity which may currently be

DirectX world view matrix multiplications - GPU or CPU the place

阅读更多关于 DirectX world view matrix multiplications - GPU or CPU the place

问题 I am new to directx, but have been surprised that most examples I have seen the world matrix and view matrix are multiplied as part of the vertex shader, rather than being multiplied by the CPU and the result being passed to the shader. For rigid objects this means you multiply the same two matrices once for every single vertex of the object. I know that the GPU can do this in parallel over a number of vertices (don't really have an idea how many), but isn't this really inefficient, or am I

cuPrintf problem

阅读更多关于 cuPrintf problem

问题 I am trying to copy a struct array to device.I am working with one GPU atm, and i have a problem with cuPrintf function which i use to debug my code. My struct definition is as below: struct Node { char Key[25]; char ConsAlterKey[25]; char MasterKey[3]; int VowelDeletion; char Data[6]; char MasterData[6]; int Children[35]; int ChildCount; }; and for test purpose i fill the struct array like this : void FillArray(Node *NodeArray) { for(int i=0;i<TotalNodeCount;i++) { strcpy(NodeArray[i].Key,

Install OpenCL(AMD SDK kit) on linux without ROOT privilege

阅读更多关于 Install OpenCL(AMD SDK kit) on linux without ROOT privilege

问题 I am trying to install OpenCL(AMD) on linux, but I am stuck on the last step(install ICD) It seems like ICD HAS to be installed at /etc/OpenCL/vendor, but I don’t have root access to the computer. Is there any way to make OpenCL work without installing ICD? (or maybe through an environment variable to add search path for ICD files?) It just seems really inconvenient for people like us when ICD file path is hardcoded. 回答1: Put the ICD-files in /some/path/icd and then export the path like so:

Converting a theano model built on GPU to CPU?

阅读更多关于 Converting a theano model built on GPU to CPU?

问题 I have some pickle files of deep learning models built on gpu. I'm trying to use them in production. But when i try to unpickle them on the server, i'm getting the following error. Traceback (most recent call last): File "score.py", line 30, in model = (cPickle.load(file)) File "/usr/local/python2.7/lib/python2.7/site-packages/Theano-0.6.0-py2.7.egg/theano/sandbox/cuda/type.py", line 485, in CudaNdarray_unpickler return cuda.CudaNdarray(npa) AttributeError: ("'NoneType' object has no

Configuring Theano so that it doesn't directly crash when a GPU memory allocation fails

阅读更多关于 Configuring Theano so that it doesn't directly crash when a GPU memory allocation fails

问题 When a Theano script tries to obtain more GPU memory than currently available, it immediately crashes: Error allocating 26,214,400 bytes of device memory (out of memory). Driver report 19,365,888 bytes free and 1,073,414,144 bytes total Is there any way to configure Theano so that it doesn't directly crash when a GPU memory allocation fails? E.g., it could instead retry every X seconds, and give up after Y tries. (One use case: I have several Theano scripts running on the same GPU and using

How to import a trained SVM detector in OpenCV 2.4.13

阅读更多关于 How to import a trained SVM detector in OpenCV 2.4.13

问题 So I have followed this guide to train my own pedestrian HOG detector. https://github.com/DaHoC/trainHOG/wiki/trainHOG-Tutorial And it was successful with 4 files generated. cvHOGClassifier.yaml descriptorvector.dat features.dat svmlightmodel.dat Does anyone know how to load the descriptorvector.dat file as a vector? I've tried this but failed. vector<float> detector; std::ifstream file; file.open("descriptorvector.dat"); file >> detector; file.close(); This is something I would like to use