numba

Can I perform dynamic cumsum of rows in pandas?

和自甴很熟 提交于 2019-11-28 00:27:02
If I have the following dataframe, derived like so: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 1))) 0 0 0 1 2 2 8 3 1 4 0 5 0 6 7 7 0 8 2 9 2 Is there an efficient way cumsum rows with a limit and each time this limit is reached, to start a new cumsum . After each limit is reached (however many rows), a row is created with the total cumsum. Below I have created an example of a function that does this, but it's very slow, especially when the dataframe becomes very large. I don't like that my function is looping and I am looking for a way to make it faster (I guess a way without a loop

Comparing Python, Numpy, Numba and C++ for matrix multiplication

风流意气都作罢 提交于 2019-11-27 23:17:42
问题 In a program I am working on, I need to multiply two matrices repeatedly. Because of the size of one of the matrices, this operation takes some time and I wanted to see which method would be the most efficient. The matrices have dimensions (m x n)*(n x p) where m = n = 3 and 10^5 < p < 10^6 . With the exception of Numpy, which I assume works with an optimized algorithm, every test consists of a simple implementation of the matrix multiplication: Below are my various implementations: Python

numba guvectorize target='parallel' slower than target='cpu'

杀马特。学长 韩版系。学妹 提交于 2019-11-27 16:56:44
问题 I've been attempting to optimize a piece of python code that involves large multi-dimensional array calculations. I am getting counterintuitive results with numba. I am running on an MBP, mid 2015, 2.5 GHz i7 quadcore, OS 10.10.5, python 2.7.11. Consider the following: import numpy as np from numba import jit, vectorize, guvectorize import numexpr as ne import timeit def add_two_2ds_naive(A,B,res): for i in range(A.shape[0]): for j in range(B.shape[1]): res[i,j] = A[i,j]+B[i,j] @jit def add

why can't I get the right sum of 1D array with numba (cuda python)?

扶醉桌前 提交于 2019-11-27 16:28:59
I try to use cuda python with numba. The code is to calculate the sum of a 1D array as follows, but I don't know how to get one value result rather than three values. python3.5 with numba + CUDA8.0 import os,sys,time import pandas as pd import numpy as np from numba import cuda, float32 os.environ['NUMBAPRO_NVVM']=r'D:\NVIDIA GPU Computing Toolkit\CUDA\v8.0\nvvm\bin\nvvm64_31_0.dll' os.environ['NUMBAPRO_LIBDEVICE']=r'D:\NVIDIA GPU Computing Toolkit\CUDA\v8.0\nvvm\libdevice' bpg = (1,1) tpb = (1,3) @cuda.jit def calcu_sum(D,T): ty = cuda.threadIdx.y bh = cuda.blockDim.y index_i = ty L = len(D)

How to parallelize this Python for loop when using Numba

自闭症网瘾萝莉.ら 提交于 2019-11-27 16:00:28
问题 I'm using the Anaconda distribution of Python, together with Numba, and I've written the following Python function that multiplies a sparse matrix A (stored in a CSR format) by a dense vector x : @jit def csrMult( x, Adata, Aindices, Aindptr, Ashape ): numRowsA = Ashape[0] Ax = numpy.zeros( numRowsA ) for i in range( numRowsA ): Ax_i = 0.0 for dataIdx in range( Aindptr[i], Aindptr[i+1] ): j = Aindices[dataIdx] Ax_i += Adata[dataIdx] * x[j] Ax[i] = Ax_i return Ax Here A is a large scipy sparse

Getting python Numba working on Ubuntu 14.10 or Fedora 21 with python 2.7

 ̄綄美尐妖づ 提交于 2019-11-27 14:26:35
问题 Recently, I have had a frustrating time to get python Numba working on Ubuntu or Fedora Linux. The main problem has been with the compilation of llvmlite. What do I need to install for these to compile properly? 回答1: The versions I got working at the end were numba-0.17.0 (also 0.18.2) and llvmlite-0.2.2 (also 0.4.0). Here are the relevant dependencies and configuration options on Ubuntu and Fedora. For Ubuntu 14.04 *Trusty) sudo apt-get install zlib1g zlib1g-dev libedit libedit-dev llvm-3.8

Achieving Numba's performance with Cython

无人久伴 提交于 2019-11-27 08:10:11
问题 Usually I'm able to match Numba's performance when using Cython. However, in this example I have failed to do so - Numba is about 4 times faster than my Cython's version. Here the Cython-version: %%cython -c=-march=native -c=-O3 cimport numpy as np import numpy as np cimport cython @cython.boundscheck(False) @cython.wraparound(False) def cy_where(double[::1] df): cdef int i cdef int n = len(df) cdef np.ndarray[dtype=double] output = np.empty(n, dtype=np.float64) for i in range(n): if df[i]>0

why can't I get the right sum of 1D array with numba (cuda python)?

ⅰ亾dé卋堺 提交于 2019-11-27 04:08:21
问题 I try to use cuda python with numba. The code is to calculate the sum of a 1D array as follows, but I don't know how to get one value result rather than three values. python3.5 with numba + CUDA8.0 import os,sys,time import pandas as pd import numpy as np from numba import cuda, float32 os.environ['NUMBAPRO_NVVM']=r'D:\NVIDIA GPU Computing Toolkit\CUDA\v8.0\nvvm\bin\nvvm64_31_0.dll' os.environ['NUMBAPRO_LIBDEVICE']=r'D:\NVIDIA GPU Computing Toolkit\CUDA\v8.0\nvvm\libdevice' bpg = (1,1) tpb =

Matrix inversion without Numpy

杀马特。学长 韩版系。学妹 提交于 2019-11-27 01:49:35
问题 I want to invert a matrix without using numpy.linalg.inv . The reason is that I am using Numba to speed up the code, but numpy.linalg.inv is not supported, so I am wondering if I can invert a matrix with 'classic' Python code. With numpy.linalg.inv an example code would look like that: import numpy as np M = np.array([[1,0,0],[0,1,0],[0,0,1]]) Minv = np.linalg.inv(M) 回答1: Here is a more elegant and scalable solution, imo. It'll work for any nxn matrix and you may find use for the other

Python numpy: cannot convert datetime64[ns] to datetime64[D] (to use with Numba)

夙愿已清 提交于 2019-11-27 01:39:25
问题 I want to pass a datetime array to a Numba function (which cannot be vectorised and would otherwise be very slow). I understand Numba supports numpy.datetime64. However, it seems it supports datetime64[D] (day precision) but not datetime64[ns] (millisecond precision) (I learnt this the hard way: is it documented?). I tried to convert from datetime64[ns] to datetime64[D], but can't seem to find a way! Any ideas? I have summarised my problem with the minimal code below. If you run testdf