gpu | 易学教程

Android read fb0 always give me blackscreen

阅读更多关于 Android read fb0 always give me blackscreen

问题 My device is Nexus 4 running Jelly Bean 4.2. I'm trying to record the screen and send it out. Most codes on internet do the cap by read /dev/graphics/fb0. It works fine in some devices and the older systems. But when I try it on my device, it fail. It only gives me blackscreen and all "0" in the raw data. I have run "adb root" to get the root permission, tried "chmod 777 fb0", "cat fb0 > /sdcard/fb0". Also I have tried codes like "mmap" and "memcpy" to get the data. But all fail. I have

Branch predication on GPU

阅读更多关于 Branch predication on GPU

问题 I have a question about branch predication in GPUs. As far as I know, in GPUs, they do predication with branches. For example I have a code like this: if (C) A else B so if A takes 40 cycles and B takes 50 cycles to finish execution, if assuming for one warp, both A and B are executed, so does it take in total 90 cycles to finish this branch? Or do they overlap A and B, i.e., when some instructions of A are executed, then wait for memory request, then some instructions of B are executed, then

How does CUDA assign device IDs to GPUs?

阅读更多关于 How does CUDA assign device IDs to GPUs?

问题 When a computer has multiple CUDA-capable GPUs, each GPU is assigned a device ID . By default, CUDA kernels execute on device ID 0 . You can use cudaSetDevice(int device) to select a different device. Let's say I have two GPUs in my machine: a GTX 480 and a GTX 670. How does CUDA decide which GPU is device ID 0 and which GPU is device ID 1 ? Ideas for how CUDA might assign device IDs (just brainstorming): descending order of compute capability PCI slot number date/time when the device was

GPU Emulator for CUDA programming without the hardware [closed]

阅读更多关于 GPU Emulator for CUDA programming without the hardware [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 months ago . Question: Is there an emulator for a Geforce card that would allow me to program and test CUDA without having the actual hardware? Info: I'm looking to speed up a few simulations of mine in CUDA, but my problem is that I'm not always around my desktop for doing this development. I would like to do some work on

(CUDA C) Why is it not printing out the value copied from device memory?

阅读更多关于 (CUDA C) Why is it not printing out the value copied from device memory?

问题 I'm learning CUDA right now through the training slides provided by NVIDIA. They have a sample program that shows how you could add two integers. The code is below: #include <stdio.h> __global__ void add(int *a, int *b, int *c) { *c = *a+*b; } int main(void) { int a, b, c; // Host copies of a, b, c int *d_a, *d_b, *d_c; // Device copies of a, b, c size_t size = sizeof(int); //Allocate space for device copies of a, b, c cudaMalloc((void**)&d_a, size); cudaMalloc((void**)&d_b, size); cudaMalloc

Using low GPU priority for background rendering

阅读更多关于 Using low GPU priority for background rendering

问题 How to debug "Using low GPU priority for background rendering." I see on console of an app using AVFoundation on iOS8 beta4? I suppose I'm doing some unneeded work that I could skip saving the battery and eliminating the message I've tripped 回答1: according to apple documentation, iOS does not allow its GPU to any background application for obvious reason that its not on Foreground . 来源： https://stackoverflow.com/questions/25039363/using-low-gpu-priority-for-background-rendering

Porting a program to CUDA - kernel inside another kernel? [closed]

阅读更多关于 Porting a program to CUDA - kernel inside another kernel? [closed]

问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 6 years ago . I am trying to parallelize a function that contains several procedures. The function goes: void _myfunction(M1,M2){ for (a = 0; a < A; a++) { Amatrix = procedure1(M1) /*contains for loops*/; Bmatrix = procedure2(M1) /*contains for loops*/; ... for ( z = 1 ; z < Z ; z++ ){ calculations with Amatrix

Tensorflow can find right cudnn in one python file but fail in another

阅读更多关于 Tensorflow can find right cudnn in one python file but fail in another

问题 I am trying to use tensorflow gpu version to train and test my deep learning model. But here comes the problem. When I train my model in one python file things go on well. Tensorflow-gpu can be used properly. Then I save my model as a pretrained on as grapg.pb format and try to reuse it in another python file. Then I got the following error messages. E tensorflow/stream_executor/cuda/cuda_dnn.cc:363] Loaded runtime CuDNN library: 7.1.4 but source was compiled with: 7.2.1. CuDNN library major

CUDA device function pointers in structure without static pointers or symbol copies

阅读更多关于 CUDA device function pointers in structure without static pointers or symbol copies

问题 My intended program flow would look like the following if it were possible: typedef struct structure_t { [...] /* device function pointer. */ __device__ float (*function_pointer)(float, float, float[]); [...] } structure; [...] /* function to be assigned. */ __device__ float my_function (float a, float b, float c[]) { /* do some stuff on the device. */ [...] } void some_structure_initialization_function (structure *st) { /* assign. */ st->function_pointer = my_function; [...] } This is not

Questions of resident warps of CUDA

阅读更多关于 Questions of resident warps of CUDA

问题 I have been using CUDA for a month, now i'm trying to make it clear that how many warps/blocks are needed to hide the latency of memory accesses. I think it is related to the maximum of resident warps on a multiprocessor. According to Table.13 in CUDA_C_Programming_Guide (v-7.5),the maximum of resident warps per multiprocessor is 64. Then, my question is : what is the resident warp? is it refer to those warps with the data read from memory of GPUs and are ready to be processed by SPs? Or