gpu

Custom reduction on GPU vs CPU yield different result

ε祈祈猫儿з 提交于 2021-02-20 03:43:34
问题 Why am I seeing different result on GPU compare to sequential CPU? import numpy from numba import cuda from functools import reduce A = (numpy.arange(100, dtype=numpy.float64)) + 1 cuda.reduce(lambda a, b: a + b * 20)(A) # result 12952749821.0 reduce(lambda a, b: a + b * 20, A) # result 100981.0 import numba numba.__version__ # '0.34.0+5.g1762237' Similar behavior happens when using Java Stream API to parallelize reduction on CPU: int n = 10; float inputArray[] = new float[n]; ArrayList<Float

Custom reduction on GPU vs CPU yield different result

送分小仙女□ 提交于 2021-02-20 03:42:07
问题 Why am I seeing different result on GPU compare to sequential CPU? import numpy from numba import cuda from functools import reduce A = (numpy.arange(100, dtype=numpy.float64)) + 1 cuda.reduce(lambda a, b: a + b * 20)(A) # result 12952749821.0 reduce(lambda a, b: a + b * 20, A) # result 100981.0 import numba numba.__version__ # '0.34.0+5.g1762237' Similar behavior happens when using Java Stream API to parallelize reduction on CPU: int n = 10; float inputArray[] = new float[n]; ArrayList<Float

No speedup using XGBClassifier with GPU support

删除回忆录丶 提交于 2021-02-19 08:19:06
问题 In the following code, I try to search over different hyper-parameters of xgboost. param_test1 = { 'max_depth':list(range(3,10,2)), 'min_child_weight':list(range(1,6,2)) } predictors = [x for x in train_data.columns if x not in ['target', 'id']] gsearch1 = GridSearchCV(estimator=XGBClassifier(learning_rate =0.1, n_estimators=100, max_depth=5, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', n_jobs=4, scale_pos_weight=1, seed=27, kvargs={'tree

how does nvidia-smi work?

孤者浪人 提交于 2021-02-19 06:15:34
问题 What is the internal operation that allows nvidia-smi fetch the hardware level details? The tool executes even when some process is already running on the GPU device and gets the utilization details, name and id of the process etc. Is it possible to develop such a tool at the user level? How is NVML related? 回答1: Nvidia-smi is a thin wrapper around NVML. You can code with NVML with help of SDK contained in Tesla Deployment Kit. Everything that can be done with nvidia-smi can be queried

how does nvidia-smi work?

邮差的信 提交于 2021-02-19 06:15:15
问题 What is the internal operation that allows nvidia-smi fetch the hardware level details? The tool executes even when some process is already running on the GPU device and gets the utilization details, name and id of the process etc. Is it possible to develop such a tool at the user level? How is NVML related? 回答1: Nvidia-smi is a thin wrapper around NVML. You can code with NVML with help of SDK contained in Tesla Deployment Kit. Everything that can be done with nvidia-smi can be queried

numba cuda does not produce correct result with += (gpu reduction needed?)

白昼怎懂夜的黑 提交于 2021-02-18 19:38:17
问题 I am using numba cuda to calculate a function. The code is simply to add up all the values into one result, but numba cuda gives me a different result from numpy. numba code import math def numba_example(number_of_maximum_loop,gs,ts,bs): from numba import cuda result = cuda.device_array([3,]) @cuda.jit(device=True) def BesselJ0(x): return math.sqrt(2/math.pi/x) @cuda.jit def cuda_kernel(number_of_maximum_loop,result,gs,ts,bs): i = cuda.grid(1) if i < number_of_maximum_loop: result[0] +=

numba cuda does not produce correct result with += (gpu reduction needed?)

喜你入骨 提交于 2021-02-18 19:36:40
问题 I am using numba cuda to calculate a function. The code is simply to add up all the values into one result, but numba cuda gives me a different result from numpy. numba code import math def numba_example(number_of_maximum_loop,gs,ts,bs): from numba import cuda result = cuda.device_array([3,]) @cuda.jit(device=True) def BesselJ0(x): return math.sqrt(2/math.pi/x) @cuda.jit def cuda_kernel(number_of_maximum_loop,result,gs,ts,bs): i = cuda.grid(1) if i < number_of_maximum_loop: result[0] +=

How to set slurm/salloc for 1 gpu per task but let job use multiple gpus?

我与影子孤独终老i 提交于 2021-02-18 18:13:36
问题 We are looking for some advice with slurm salloc gpu allocations. Currently, given: % salloc -n 4 -c 2 -gres=gpu:1 % srun env | grep CUDA CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0 However, we desire more than just device 0 to be used. Is there a way to specify an salloc with srun/mpirun to get the following? CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=1 CUDA_VISIBLE_DEVICES=2 CUDA_VISIBLE_DEVICES=3 This is desired such that each task gets 1

How to set slurm/salloc for 1 gpu per task but let job use multiple gpus?

…衆ロ難τιáo~ 提交于 2021-02-18 18:13:31
问题 We are looking for some advice with slurm salloc gpu allocations. Currently, given: % salloc -n 4 -c 2 -gres=gpu:1 % srun env | grep CUDA CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0 However, we desire more than just device 0 to be used. Is there a way to specify an salloc with srun/mpirun to get the following? CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=1 CUDA_VISIBLE_DEVICES=2 CUDA_VISIBLE_DEVICES=3 This is desired such that each task gets 1

GPU computing for bootstrapping using “boot” package

寵の児 提交于 2021-02-18 17:11:32
问题 I would like to do a large analysis using bootstrapping. I saw that the speed of bootstrapping is increased using parallel computing as in the following code: Parallel computing # detect number of cpu library(parallel) detectCores() library(boot) # boot function --> mean bt.mean <- function(dat, d){ x <- dat[d] m <- mean(x) return(m) } # obtain confidence intervals # use parallel computing with 4 cpus x <- mtcars$mpg bt <- boot(x, bt.mean, R = 1000, parallel = "snow", ncpus = 4) quantile(bt$t