reduction

Custom reduction on GPU vs CPU yield different result

ε祈祈猫儿з 提交于 2021-02-20 03:43:34
问题 Why am I seeing different result on GPU compare to sequential CPU? import numpy from numba import cuda from functools import reduce A = (numpy.arange(100, dtype=numpy.float64)) + 1 cuda.reduce(lambda a, b: a + b * 20)(A) # result 12952749821.0 reduce(lambda a, b: a + b * 20, A) # result 100981.0 import numba numba.__version__ # '0.34.0+5.g1762237' Similar behavior happens when using Java Stream API to parallelize reduction on CPU: int n = 10; float inputArray[] = new float[n]; ArrayList<Float

Custom reduction on GPU vs CPU yield different result

送分小仙女□ 提交于 2021-02-20 03:42:07
问题 Why am I seeing different result on GPU compare to sequential CPU? import numpy from numba import cuda from functools import reduce A = (numpy.arange(100, dtype=numpy.float64)) + 1 cuda.reduce(lambda a, b: a + b * 20)(A) # result 12952749821.0 reduce(lambda a, b: a + b * 20, A) # result 100981.0 import numba numba.__version__ # '0.34.0+5.g1762237' Similar behavior happens when using Java Stream API to parallelize reduction on CPU: int n = 10; float inputArray[] = new float[n]; ArrayList<Float

Initialize variable for omp reduction

混江龙づ霸主 提交于 2021-02-19 08:40:24
问题 The OpenMP standard specifies an initial value for a reduction variable. So do I have to initialize the variable and how would I do that in the following case: int sum; //... for(int it=0;i<maxIt;i++){ #pragma omp parallel { #pragma omp for nowait for(int i=0;i<ct;i++) arrayX[i]=arrayY[i]; sum = 0; #pragma omp for reduction(+:sum) for(int i=0;i<ct;i++) sum+=arrayZ[i]; } //Use sum } Note that I use only 1 parallel region to minimize overhead and to allow the nowait in the first loop. Using

numba cuda does not produce correct result with += (gpu reduction needed?)

白昼怎懂夜的黑 提交于 2021-02-18 19:38:17
问题 I am using numba cuda to calculate a function. The code is simply to add up all the values into one result, but numba cuda gives me a different result from numpy. numba code import math def numba_example(number_of_maximum_loop,gs,ts,bs): from numba import cuda result = cuda.device_array([3,]) @cuda.jit(device=True) def BesselJ0(x): return math.sqrt(2/math.pi/x) @cuda.jit def cuda_kernel(number_of_maximum_loop,result,gs,ts,bs): i = cuda.grid(1) if i < number_of_maximum_loop: result[0] +=

numba cuda does not produce correct result with += (gpu reduction needed?)

喜你入骨 提交于 2021-02-18 19:36:40
问题 I am using numba cuda to calculate a function. The code is simply to add up all the values into one result, but numba cuda gives me a different result from numpy. numba code import math def numba_example(number_of_maximum_loop,gs,ts,bs): from numba import cuda result = cuda.device_array([3,]) @cuda.jit(device=True) def BesselJ0(x): return math.sqrt(2/math.pi/x) @cuda.jit def cuda_kernel(number_of_maximum_loop,result,gs,ts,bs): i = cuda.grid(1) if i < number_of_maximum_loop: result[0] +=

Numpy index of the maximum with reduction - numpy.argmax.reduceat

给你一囗甜甜゛ 提交于 2021-02-16 18:43:33
问题 I have a flat array b : a = numpy.array([0, 1, 1, 2, 3, 1, 2]) And an array c of indices marking the start of each "chunk": b = numpy.array([0, 4]) I know I can find the maximum in each "chunk" using a reduction: m = numpy.maximum.reduceat(a,b) >>> array([2, 3], dtype=int32) But... Is there a way to find the index of the maximum <edit> within a chunk </edit> (like numpy.argmax ), with vectorized operations (no lists, loops)? 回答1: Borrowing the idea from this post. Steps involved : Offset all

OpenMP reduction on container elements

泄露秘密 提交于 2021-01-29 13:32:42
问题 I have a nested loop, with few outer, and many inner iterations. In the inner loop, I need to calculate a sum, so I want to use an OpenMP reduction. The outer loop is on a container, so the reduction is supposed to happen on an element of that container. Here's a minimal contrived example: #include <omp.h> #include <vector> #include <iostream> int main(){ constexpr int n { 128 }; std::vector<int> vec (4, 0); for (unsigned int i {0}; i<vec.size(); ++i){ /* this does not work */ //#pragma omp

How to change a property of an object in Vue?

半城伤御伤魂 提交于 2021-01-29 06:20:21
问题 I try to modify data in Vue for chartjs. The basic datas are purchase orders with supplier_id and supplier_name I like to know, how many orders each supplier has: methods: { render() { let suppliers = [] _.each(this.orders, function (value, key) { if(!_.find(suppliers, {name: value.supplier_name})) { suppliers.push({ name: value.supplier_name, num: 1 }); } else { let supplier = _.find(suppliers, {name: value.supplier_name}) let data = { name: value.supplier_name, num: supplier.num + 1 } Vue

Cuda - Multiple sums in each vector element

拈花ヽ惹草 提交于 2020-03-20 12:01:08
问题 The product of two series of Chebyshev polynomials with coefficient a and b can be represented by the formula The problem is to parallelize this as much as possible. I have managed to use cuda to parallelize the formula above by simply applying one thread per vector element. Thus one thread performs the sums/multiplications. #include <stdio.h> #include <iostream> #include <cuda.h> #include <time.h> __global__ void chebyprod(int n, float *a, float *b, float *c){ int i = blockIdx.x *blockDim.x

Cuda - Multiple sums in each vector element

佐手、 提交于 2020-03-20 11:58:42
问题 The product of two series of Chebyshev polynomials with coefficient a and b can be represented by the formula The problem is to parallelize this as much as possible. I have managed to use cuda to parallelize the formula above by simply applying one thread per vector element. Thus one thread performs the sums/multiplications. #include <stdio.h> #include <iostream> #include <cuda.h> #include <time.h> __global__ void chebyprod(int n, float *a, float *b, float *c){ int i = blockIdx.x *blockDim.x