gpu | 易学教程

OpenCL struct values correct on CPU but not on GPU

阅读更多关于 OpenCL struct values correct on CPU but not on GPU

问题 I do have a struct in a file wich is included by the host code and the kernel typedef struct { float x, y, z, dir_x, dir_y, dir_z; int radius; } WorklistStruct; I'm building this struct in my c++ host code and passing it via a buffer to the OpenCL kernel. If I'm choosing an CPU device for computation I will get the following result: printf ( "item:[%f,%f,%f][%f,%f,%f]%d,%d\n", item.x, item.y, item.z, item.dir_x, item.dir_y, item.dir_z , item.radius ,sizeof(float)); Host: item:[20.169043,7

OpenGL: render time limit on linux

阅读更多关于 OpenGL: render time limit on linux

问题 I'm implementing some computation algorithm via OpenGL and Qt. All computations are executed in fragment shader. Sometimes when i trying to execute some hard computations (that takes more than 5 seconds on GPU) OpenGL breaks computation before it ends. I suppose this is system like TDR from Windows. I think that i should split input data by several parts but i need to know how long computation allowed. How i can obtain render time limit on linux (it will be cool if there is crossplatform

Build Successful but not running on simulator

阅读更多关于 Build Successful but not running on simulator

问题 I download the code of Brad Larson from here. When I run it. It shows the build successful but it's not run in simulator. please direct me in right direction. I check the method in app delegate file - (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions{ ....... } not called. Thanks 回答1: The scheme you're trying to run is probably not set correctly (it wasn't for me when i downloaded the code either). Just change the scheme to the one you

Race Condition in CUDA programs

阅读更多关于 Race Condition in CUDA programs

问题 I have two pieces of code. One written in C and the corresponding operation written in CUDA. Please help me understand how __syncthreads() works in context of the following programs. As per my understanding, __syncthreads() ensures synchronization of threads limited to one block. C program : { for(i=1;i<10000;i++) { t=a[i]+b[i]; a[i-1]=t; } } ` The equivalent CUDA program : ` __global__ void kernel0(int *b, int *a, int *t, int N) { int b0=blockIdx.x; int t0=threadIdx.x; int tid=b0*blockDim.x

Hardware Accelerated Image Scaling in windows using C++

阅读更多关于 Hardware Accelerated Image Scaling in windows using C++

问题 I have to scale a bitmap image (e.g 1280 x 720 to 1920 X 180 and vice versa). I am using this scaling in video capturing from screen. Software based scaling consumes lots of CPU scaling and slower as well. Is there any hardware accelerated API or library to perform scaling? Some methods are discussed in thread How to use hardware video scalers?. Buts no final conclusion. Support Needed : Windows 7 onwards 回答1: If you have a a IDirect3DTexture9 of the image to be scaled, you can use

how to find out the RAM and GPU information of my visitors?

阅读更多关于 how to find out the RAM and GPU information of my visitors?

问题 I want to know how much RAM my visitors have and all the information available about their GPU. Is there any way to achieve this via JavaScript or maybe ActionScript (Flash)? 回答1: JavaScript, Browser extensions and Plugins are heavily sandboxed that they have limited, to no access to the system for security purposes. Only limited hardware can be accessed directly (with the user's consent), like camera and microphone for JavaScript's getUserMedia or Flash. The nearest you can get is to have

Using gpu::GpuMat in OpenCV C++

阅读更多关于 Using gpu::GpuMat in OpenCV C++

问题 I would like to know how can I modify a gpu::GpuMat . In fact I would like to know if it is possible to use a gpu::GpuMat like a cv::Mat . I would like to do something like that: cv::namedWindow("Result"); cv::Mat src_host = cv::imread("lena.jpg", CV_LOAD_IMAGE_GRAYSCALE); cv::gpu::GpuMat dst, src; src.upload(src_host); for (unsigned int y = 0; y < src.rows; y++){ for (unsigned int x = 0; x < src.cols; x++){ src.at<uchar>(y,x) = 0; } } cv::Mat result_host; dst.download(result_host); cv:

Using multiple_gpu_model on keras - causing resource exhaustion

阅读更多关于 Using multiple_gpu_model on keras - causing resource exhaustion

问题 I built my network the following way: # Build U-Net model inputs = Input((IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS)) s = Lambda(lambda x: x / 255) (inputs) width = 64 c1 = Conv2D(width, (3, 3), activation='relu', padding='same') (s) c1 = Conv2D(width, (3, 3), activation='relu', padding='same') (c1) p1 = MaxPooling2D((2, 2)) (c1) c2 = Conv2D(width*2, (3, 3), activation='relu', padding='same') (p1) c2 = Conv2D(width*2, (3, 3), activation='relu', padding='same') (c2) p2 = MaxPooling2D((2, 2)) (c2) c3

How to give an option to select graphics adapter in a DirectX 11 application?

阅读更多关于 How to give an option to select graphics adapter in a DirectX 11 application?

问题 I think I know how it should work - only it does not. I have a lenovo laptop with a 860m and an intel integrated card. I can run my application from outside with both gpu, and everything works fine: the selected gpu will be the adapter with index 0, it has the laptop screen as output, etc. However if I try to use the adapter with index 1 (if I run the app normally, that is the nvidia, if I run it with the nvidia gpu, that is the intel), IDXGIOutput::EnumOutputs does not find anything, so I

Profiling GPU usage in C#

阅读更多关于 Profiling GPU usage in C#

问题 I am writing a C# application that is GPU accelerated using EMGU's GpuInvoke method. I would like to profile my code and look at the load on the GPU and the amount of GPU memory I'm using, but I'm having trouble finding a good way to do that. It seems like it should be simple, but I can't figure out what I'm missing. Thank you 回答1: Some options: using Performance Monitor perfmon.exe which is the easiest tool to use using tools like GPUZ using a performance kit from the GPU hardware vendor