numba

Numba on nested Numpy arrays

不打扰是莪最后的温柔 提交于 2019-12-13 15:27:29
问题 Setup I have the following two implementations of a matrix-calculation: The first implementation uses a matrix of shape (n, m) and the calculation is repeated in a for-loop for repetition -times: import numpy as np from numba import jit @jit def foo(): for i in range(1, n): for j in range(1, m): _deleteA = ( matrix[i, j] + #some constants added here ) _deleteB = ( matrix[i, j-1] + #some constants added here ) matrix[i, j] = min(_deleteA, _deleteB) return matrix repetition = 3 for x in range

get wrong result when caculating on GPU (python3.5+numba+CUDA8.0)

你离开我真会死。 提交于 2019-12-13 10:09:30
问题 I want to get the sum of different parts of an array. I run my code. and find two problems from what was printed. pro1: Described in detail here. It has been solved. Maybe it's not a real problem. pro2: In my code, I gived different value to sbuf[0,2], sbuf[1,2], sbuf[2,2] and sbuf[0,3], sbuf[1,3], sbuf[2,3]. But find that after cuda.syncthreads() , the values bacame same between sbuf[0,2] and sbuf[0,3], sbuf[1,2] and sbuf[1,3], sbuf[2,2] and sbuf[2,3]. It directly lead to the values of Xi_s,

Cuda Parallelize Kernel

落爺英雄遲暮 提交于 2019-12-13 09:03:54
问题 I'm trying to parallelize a simple update loop of a simulation on the GPU. Basically there are a bunch of "creatures" represented by circles that in each update loop will move and then there will be a check of whether any of them intersect. radii is the radius of the different types of creatures. import numpy as np import math from numba import cuda @cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)') def update(p_x, p_y, radii,

numba\jit doesn't allow the use of np argsort in nopython mode

£可爱£侵袭症+ 提交于 2019-12-13 07:48:55
问题 Receiving this error message: Failed at nopython (nopython frontend) [1m[1m[1mInvalid usage of Function(<function argsort at 0x0000000002A67840>) with parameters (array(float64, 2d, C), axis=int64) * parameterized In definition 0: While using this code def rankbids(bids, shifts, groupPeriod, period): rowsSize = bids.shape[0]; finaltable = np.zeros((rowsSize, groupPeriod), dtype=np.float64) for i in range(0, period): #for 0 to 99 #CONSTANT 4 UPDATE WHEN NEEDED for worker in range(rowsSize):

Numba `nogil=True` + ThreadPoolExecutor results in smaller speed up than expected

梦想的初衷 提交于 2019-12-13 02:49:16
问题 This is a follow-up to my previous question: I'm trying to use Numba and Dask to speed up a slow computation that is similar to calculating the kernel density estimate of a huge collection of points. My plan was to write the computationally expensive logic in a jit ed function and then split the work among the CPU cores using dask . I wanted to use the nogil feature of numba.jit function so that I could use the dask threading backend so as to avoid unnecessary memory copies of the input data

Creating `NumPy` arrays inside a function decorated with `numba`'s `@jit(nopython=True)`?

浪子不回头ぞ 提交于 2019-12-13 02:16:39
问题 I would like to create a numpy array inside a function decorated with numba 's @jit(nopython=True) . For example: import numpy as np import numba @numba.jit(nopython=True) def funny_func(): zero_array = np.zeros(10) sum_result = 0 for elem in zero_array: sum_result += elem return sum_result print funny_func() Compiling this script creates the following error: UntypedAttributeError: Unknown attribute "zeros" of type Module(<module 'numpy' from 'A:\Anaconda\lib\site-packages\numpy\__init__.pyc'

Improving runtime of python numpy code

冷暖自知 提交于 2019-12-12 19:13:52
问题 I have a code which reassigns bins to a large numpy array. Basically, the elements of the large array has been sampled at different frequency and the final goal is to rebin the entire array at fixed bins freq_bins . The code is kind of slow for the array I have. Is there any good way to improve the runtime of this code? A factor of few would do for now. May be some numba magic would do. import numpy as np import time division = 90 freq_division = 50 cd = 3000 boost_factor = np.random.rand

numba - guvectorize barely faster than jit

£可爱£侵袭症+ 提交于 2019-12-12 08:27:32
问题 I was trying to parallellize a Monte Carlo simulation that operates on many independent datasets. I found out that numba's parallel guvectorize implementation was barely 30-40% faster than the numba jit implementation. I found these (1, 2) comparable topics on Stackoverflow, but they do not really answer my question. In the first case, the implementation is slowed down by a fall back to object mode and in the second case the original poster did not properly use guvectorize - none of these

CUDA Function Won't Execute For Loop on Python with Numba

送分小仙女□ 提交于 2019-12-12 06:39:24
问题 I'm trying to run a simple update loop of a simulation on the GPU. Basically there are a bunch of "creatures" represented by circles that in each update loop will move and then there will be a check of whether any of them intersect. import numpy as np import math from numba import cuda @cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)') def update(p_x, p_y, radii, types, velocities, max_velocities, acceleration, num_creatures,

How to use Python Class with Numba

风格不统一 提交于 2019-12-12 04:17:03
问题 I have Numba 0.24 and it supports classes. When I try to build the simplest class I can imagine I find an error! What's happening? from numba import jitclass @jitclass class foo: x = 2 bar = foo() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-2-3e0fd8d4bd2b> in <module>() 3 class foo: 4 x = 2 ----> 5 bar = foo() TypeError: wrap() missing 1 required positional argument: 'cls' Am I missing something here?