gpu

How can I keep the gpu running for more than 12 hours in google colaboratory?

妖精的绣舞 提交于 2019-12-23 01:38:09
问题 I am trying to train a model on google colaboratoy but the problem is that the GPU gets disconnected after 12 hours and hence I am not able to train my model beyond a certain point. Is there a way to keep GPU connected for longer times? 来源: https://stackoverflow.com/questions/49469697/how-can-i-keep-the-gpu-running-for-more-than-12-hours-in-google-colaboratory

OpenCV Big difference between StereoSGBM and gpu::StereoBM_GPU

Deadly 提交于 2019-12-23 01:13:23
问题 I try to generate disparity of stereo image by using OpenCV and optimize performance by using GPU but the results of them are different. StereoSGBM initialize StereoSGBM sbm; sbm.SADWindowSize = 3; sbm.numberOfDisparities = 144; sbm.preFilterCap = 63; sbm.minDisparity = -39; sbm.uniquenessRatio = 10; sbm.speckleWindowSize = 100; sbm.speckleRange = 32; sbm.disp12MaxDiff = 1; sbm.fullDP = false; sbm.P1 = 216; sbm.P2 = 864; sbm(grayLeftCurrentFrameCPU, grayRightCurrentFrameCPU,

Android Emulator running into loading problems

旧街凉风 提交于 2019-12-22 17:37:15
问题 Up until 2 days ago, everything has been working fine and perfect. I was using the Nexus 5x and 4 emulators with no issues apart from a few crashes on the 5x due to virtual memory. Now everytime I try to load any of my emulators, I am greeted with a "GPU driver issue" error box and the emulator I'm working on loads but is very laggy and virtual unusable and the UI of the app I'm testing keeps bugging out. I tried updating my driver, in fact this is the error message I got after updating to

keras with tensorflow on GPU machine - some parts are very slow

孤街醉人 提交于 2019-12-22 16:54:47
问题 I'm trying to train a model using keras \ tensorflow (1.4) on a p3.2xlarge aws machine (which has a NVIDIA Tesla V100 GPU) two parts of the initialisation are very slow when using a GPU, but run in a reasonable time on CPU The first part is "calling" an embedding layer during model setup network = embedding(input) this embedding layer is used several times, but only the 1st time is slow it appears that this is the phase that the weights are copied to the GPU, and it takes a few minuets (~5)

keras with tensorflow on GPU machine - some parts are very slow

一笑奈何 提交于 2019-12-22 16:54:25
问题 I'm trying to train a model using keras \ tensorflow (1.4) on a p3.2xlarge aws machine (which has a NVIDIA Tesla V100 GPU) two parts of the initialisation are very slow when using a GPU, but run in a reasonable time on CPU The first part is "calling" an embedding layer during model setup network = embedding(input) this embedding layer is used several times, but only the 1st time is slow it appears that this is the phase that the weights are copied to the GPU, and it takes a few minuets (~5)

How does persistent mapped buffers work in OpenGL?

依然范特西╮ 提交于 2019-12-22 14:51:45
问题 In OpenGL 4.4 there was a nice extension added: ARB_buffer_storage How does persistent buffers work (or might work) - using MAP_COHERENT_BIT and MAP_PERSISTENT_BIT set ? Is there some special intermediate buffer? Why it can be usually faster than normal mapped buffers? 来源: https://stackoverflow.com/questions/27777368/how-does-persistent-mapped-buffers-work-in-opengl

CPU->GPU transfer vs GPU->CPU transfer

帅比萌擦擦* 提交于 2019-12-22 14:02:53
问题 I have been doing some experiments regarding measuring the latency of data transfer from CPU->GPU and GPU->CPU. I found that CPU->GPU data transfer rate is almost twice as much compared to GPU->CPU transfer rate for a particular message size. Can anybody explain me why this is so? 回答1: Since don't know the detail about your experiment, like what's CPU/GPU used, how to measure transfer rate, I just guess that, data transfer from CPU->GPU, normally is through DMA. each time it can transfer a

The impact of goto instruction at intra-warp divergence in CUDA code

ぐ巨炮叔叔 提交于 2019-12-22 13:58:16
问题 For simple intra-warp thread divergence in CUDA, what I know is that SM selects a re-convergence point (PC address), and executes instructions in both/multiple paths while disabling effects of execution for the threads that haven't taken the path. For example, in below piece of code: if( threadIdx.x < 16 ) { A: // do something. } else { B: // do something else. } C: // rest of code. C is the re-convergence point, warp scheduler schedules instructions at both A and B , while disabling

How to run Python on AMD GPU?

混江龙づ霸主 提交于 2019-12-22 13:53:05
问题 We are currently trying to optimize a system in which there are at least 12 variables. Total comibination of these variable is over 1 billion. This is not deep learning or machine learning or Tensorflow or whatsoever but arbitrary calculation on time series data. We have implemented our code in Python and successfully run it on CPU. We also tried multiprocessing which also works well but we need faster computation since calculation takes weeks. We have a GPU system consisting of 6 AMD GPUs.

Mxnet - slow array copy to GPU

。_饼干妹妹 提交于 2019-12-22 13:52:20
问题 My problem: How should I perform fast matrix multiplication in mxnet? My concrete problem: array copy to GPU is slow. What can be done about it? I create random arrays, copy them to the context, and then multiply. import mxnet as mx import mxnet.ndarray as nd from mxnet import profiler profiler.set_config(aggregate_stats=True) ctx = mx.cpu() # create arrays on CPU profiler.set_state('run') a = nd.random.uniform(-1, 1, shape=(10000, 10000), ctx=mx.cpu()) b = nd.random.uniform(-1, 1, shape=