gpu

Android read fb0 always give me blackscreen

廉价感情. 提交于 2019-12-17 18:45:34
问题 My device is Nexus 4 running Jelly Bean 4.2. I'm trying to record the screen and send it out. Most codes on internet do the cap by read /dev/graphics/fb0. It works fine in some devices and the older systems. But when I try it on my device, it fail. It only gives me blackscreen and all "0" in the raw data. I have run "adb root" to get the root permission, tried "chmod 777 fb0", "cat fb0 > /sdcard/fb0". Also I have tried codes like "mmap" and "memcpy" to get the data. But all fail. I have

Branch predication on GPU

早过忘川 提交于 2019-12-17 16:08:03
问题 I have a question about branch predication in GPUs. As far as I know, in GPUs, they do predication with branches. For example I have a code like this: if (C) A else B so if A takes 40 cycles and B takes 50 cycles to finish execution, if assuming for one warp, both A and B are executed, so does it take in total 90 cycles to finish this branch? Or do they overlap A and B, i.e., when some instructions of A are executed, then wait for memory request, then some instructions of B are executed, then

How does CUDA assign device IDs to GPUs?

余生颓废 提交于 2019-12-17 07:30:53
问题 When a computer has multiple CUDA-capable GPUs, each GPU is assigned a device ID . By default, CUDA kernels execute on device ID 0 . You can use cudaSetDevice(int device) to select a different device. Let's say I have two GPUs in my machine: a GTX 480 and a GTX 670. How does CUDA decide which GPU is device ID 0 and which GPU is device ID 1 ? Ideas for how CUDA might assign device IDs (just brainstorming): descending order of compute capability PCI slot number date/time when the device was

GPU Emulator for CUDA programming without the hardware [closed]

三世轮回 提交于 2019-12-17 03:22:44
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 months ago . Question: Is there an emulator for a Geforce card that would allow me to program and test CUDA without having the actual hardware? Info: I'm looking to speed up a few simulations of mine in CUDA, but my problem is that I'm not always around my desktop for doing this development. I would like to do some work on

(CUDA C) Why is it not printing out the value copied from device memory?

☆樱花仙子☆ 提交于 2019-12-14 04:26:57
问题 I'm learning CUDA right now through the training slides provided by NVIDIA. They have a sample program that shows how you could add two integers. The code is below: #include <stdio.h> __global__ void add(int *a, int *b, int *c) { *c = *a+*b; } int main(void) { int a, b, c; // Host copies of a, b, c int *d_a, *d_b, *d_c; // Device copies of a, b, c size_t size = sizeof(int); //Allocate space for device copies of a, b, c cudaMalloc((void**)&d_a, size); cudaMalloc((void**)&d_b, size); cudaMalloc

Using low GPU priority for background rendering

我与影子孤独终老i 提交于 2019-12-14 03:44:51
问题 How to debug "Using low GPU priority for background rendering." I see on console of an app using AVFoundation on iOS8 beta4? I suppose I'm doing some unneeded work that I could skip saving the battery and eliminating the message I've tripped 回答1: according to apple documentation, iOS does not allow its GPU to any background application for obvious reason that its not on Foreground . 来源: https://stackoverflow.com/questions/25039363/using-low-gpu-priority-for-background-rendering

Porting a program to CUDA - kernel inside another kernel? [closed]

佐手、 提交于 2019-12-14 03:35:25
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 6 years ago . I am trying to parallelize a function that contains several procedures. The function goes: void _myfunction(M1,M2){ for (a = 0; a < A; a++) { Amatrix = procedure1(M1) /*contains for loops*/; Bmatrix = procedure2(M1) /*contains for loops*/; ... for ( z = 1 ; z < Z ; z++ ){ calculations with Amatrix

Tensorflow can find right cudnn in one python file but fail in another

时光毁灭记忆、已成空白 提交于 2019-12-14 02:35:04
问题 I am trying to use tensorflow gpu version to train and test my deep learning model. But here comes the problem. When I train my model in one python file things go on well. Tensorflow-gpu can be used properly. Then I save my model as a pretrained on as grapg.pb format and try to reuse it in another python file. Then I got the following error messages. E tensorflow/stream_executor/cuda/cuda_dnn.cc:363] Loaded runtime CuDNN library: 7.1.4 but source was compiled with: 7.2.1. CuDNN library major

CUDA device function pointers in structure without static pointers or symbol copies

被刻印的时光 ゝ 提交于 2019-12-14 02:33:53
问题 My intended program flow would look like the following if it were possible: typedef struct structure_t { [...] /* device function pointer. */ __device__ float (*function_pointer)(float, float, float[]); [...] } structure; [...] /* function to be assigned. */ __device__ float my_function (float a, float b, float c[]) { /* do some stuff on the device. */ [...] } void some_structure_initialization_function (structure *st) { /* assign. */ st->function_pointer = my_function; [...] } This is not

Questions of resident warps of CUDA

 ̄綄美尐妖づ 提交于 2019-12-14 02:27:46
问题 I have been using CUDA for a month, now i'm trying to make it clear that how many warps/blocks are needed to hide the latency of memory accesses. I think it is related to the maximum of resident warps on a multiprocessor. According to Table.13 in CUDA_C_Programming_Guide (v-7.5),the maximum of resident warps per multiprocessor is 64. Then, my question is : what is the resident warp? is it refer to those warps with the data read from memory of GPUs and are ready to be processed by SPs? Or