numba

Huge errors trying numba

試著忘記壹切 提交于 2019-12-05 22:03:21
I'm running into a big load of errors using numba. Ironically, the correct result is printed after the errors. I'm using the newest Anaconda python and installed numba with conda install numba once on Ubuntu 13, 64 bit and also anaconda 64 bit and on windows 64 bit with a 32 bit version of anaconda. The script I'm trying to execute is: # -*- coding: utf-8 -*- import math from numba import autojit pi = math.pi @autojit def sinc(x): if x == 0.0: return 1.0 else: return math.sin(x*pi)/(pi*x) if __name__ == '__main__': a = 4.5 print sinc(a) and the errors I get are: DEBUG -- translate:361

Filtering a NumPy Array: what is the best approach?

心不动则不痛 提交于 2019-12-05 20:26:47
Suppose I have a NumPy array arr that I want to element-wise filter, e.g. I want to get only values below a certain threshold value k . There are a couple of methods, e.g.: Using generators: np.fromiter((x for x in arr if x < k), dtype=arr.dtype) Using boolean mask slicing: arr[arr < k] Using np.where() : arr[np.where(arr < k)] Using np.nonzero() : arr[np.nonzero(arr < k)] Using a Cython-based custom implementation(s) Using a Numba-based custom implementation(s) Which is the fastest? What about memory efficiency? (EDITED: Added np.nonzero() based on @ShadowRanger comment) Definitions Using

Numba autojit error on comparing numpy arrays

早过忘川 提交于 2019-12-05 20:00:50
问题 When I compare two numpy arrays inside my function I get an error saying only length-1 arrays can be converted to Python scalars: from numpy.random import rand from numba import autojit @autojit def myFun(): a = rand(10,1) b = rand(10,1) idx = a > b return idx myFun() The error: --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-7-f7b68c0872a3> in <module>() ----> 1 myFun() /Users/Guest/Library/Enthought

cProfile adds significant overhead when calling numba jit functions

偶尔善良 提交于 2019-12-05 13:51:48
问题 Compare a pure Python no-op function with a no-op function decorated with @numba.jit , that is: import numba @numba.njit def boring_numba(): pass def call_numba(x): for t in range(x): boring_numba() def boring_normal(): pass def call_normal(x): for t in range(x): boring_normal() If we time this with %timeit , we get the following: %timeit call_numba(int(1e7)) 792 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit call_normal(int(1e7)) 737 ms ± 2.7 ms per loop (mean ± std.

Need help vectorizing code or optimizing

ぃ、小莉子 提交于 2019-12-05 13:34:25
I am trying to do a double integral by first interpolating the data to make a surface. I am using numba to try and speed this process up, but it's just taking too long. Here is my code, with the images needed to run the code located at here and here . Noting that your code has a quadruple-nested set of for loops, I focused on optimizing the inner pair. Here's the old code: for i in xrange(K.shape[0]): for j in xrange(K.shape[1]): print(i,j) '''create an r vector ''' r=(i*distX,j*distY,z) for x in xrange(img.shape[0]): for y in xrange(img.shape[1]): '''create an ksi vector, then calculate it's

Coroutines in numba

六月ゝ 毕业季﹏ 提交于 2019-12-05 11:47:55
I'm working on something that requires fast coroutines and I believe numba could speed up my code. Here's a silly example: a function that squares its input, and adds to it the number of times its been called. def make_square_plus_count(): i = 0 def square_plus_count(x): nonlocal i i += 1 return x**2 + i return square_plus_count You can't even nopython=False JIT this, presumably due to the nonlocal keyword. But you don't need nonlocal if you use a class instead: def make_square_plus_count(): @numba.jitclass({'i': numba.uint64}) class State: def __init__(self): self.i = 0 state = State() @numba

How can you implement a C callable from Numba for efficient integration with nquad?

馋奶兔 提交于 2019-12-05 09:00:23
I need to do a numerical integration in 6D in python. Because the scipy.integrate.nquad function is slow I am currently trying to speed things up by defining the integrand as a scipy.LowLevelCallable with Numba. I was able to do this in 1D with the scipy.integrate.quad by replicating the example given here : import numpy as np from numba import cfunc from scipy import integrate def integrand(t): return np.exp(-t) / t**2 nb_integrand = cfunc("float64(float64)")(integrand) # regular integration %timeit integrate.quad(integrand, 1, np.inf) 10000 loops, best of 3: 128 µs per loop # integration

How to determine if numba's prange actually works correctly?

大兔子大兔子 提交于 2019-12-05 06:31:14
In another Q+A ( Can I perform dynamic cumsum of rows in pandas? ) I made a comment regarding the correctness of using prange about this code (of this answer ): from numba import njit, prange @njit def dynamic_cumsum(seq, index, max_value): cumsum = [] running = 0 for i in prange(len(seq)): if running > max_value: cumsum.append([index[i], running]) running = 0 running += seq[i] cumsum.append([index[-1], running]) return cumsum The comment was: I wouldn't recommend parallelizing a loop that isn't pure. In this case the running variable makes it impure. There are 4 possible outcomes: (1)numba

Optimize Double loop in python

北战南征 提交于 2019-12-05 03:27:55
I am trying to optimize the following loop : def numpy(nx, nz, c, rho): for ix in range(2, nx-3): for iz in range(2, nz-3): a[ix, iz] = sum(c*rho[ix-1:ix+3, iz]) b[ix, iz] = sum(c*rho[ix-2:ix+2, iz]) return a, b I tried different solutions and found using numba to calculate the sum of the product leads to better performances: import numpy as np import numba as nb import time @nb.autojit def sum_opt(arr1, arr2): s = arr1[0]*arr2[0] for i in range(1, len(arr1)): s+=arr1[i]*arr2[i] return s def numba1(nx, nz, c, rho): for ix in range(2, nx-3): for iz in range(2, nz-3): a[ix, iz] = sum_opt(c, rho

Negative Speed Gain Using Numba Vectorize target='cuda'

纵然是瞬间 提交于 2019-12-04 17:33:31
I am trying to test out the effectiveness of using the Python Numba module's @vectorize decorator for speeding up a code snippet relevant to my actual code. I'm utilizing a code snippet provided in CUDAcast #10 available here and shown below: import numpy as np from timeit import default_timer as timer from numba import vectorize @vectorize(["float32(float32, float32)"], target='cpu') def VectorAdd(a,b): return a + b def main(): N = 32000000 A = np.ones(N, dtype=np.float32) B = np.ones(N, dtype=np.float32) C = np.zeros(N, dtype=np.float32) start = timer() C = VectorAdd(A, B) vectoradd_time =