numba

pandas处理大数据的技巧

匿名 (未验证) 提交于 2019-12-03 00:39:02
refer : https://yq.aliyun.com/articles/530060?spm=a2c4e.11153940.blogcont181452.16.413f2ef21NKngz # http://www.datayuan.cn/article/6737.htm https://yq.aliyun.com/articles/210393?spm=a2c4e.11153940.blogcont381482.21.77131127S0t3io - -- 大文本数据的读写 有时候我们会拿到一些很大的文本文件,完整读入内存,读入的过程会很慢,甚至可能无法读入内存,或者可以读入内存,但是没法进行进一步的计算,这个时候如果我们不是要进行很复杂的运算,可以使用read_csv提供的chunksize或者iterator参数,来部分读入文件,处理完之后再通过to_csv的mode=‘a‘,将每部分结果逐步写入文件。 to_csv, to_excel的选择 在输出结果时统称会遇到输出格式的选择,平时大家用的最多的.csv, .xls, .xlsx,后两者一个是excel2003,一个是excel2007,我的经验是csv>xls>xlsx,大文件输出csv比输出excel要快的多,xls只支持60000+条记录,xlsx虽然支持记录变多了,但是,如果内容有中文常常会出现诡异的内容丢失

python numba讲解

匿名 (未验证) 提交于 2019-12-02 22:51:30
Ŀ¼ 1.计算numpy数组各个数值的双曲正切值。 (1)导入numpy、numba及其编译器 import numpy as np import numba from numba import jit (2)传入numba装饰器jit,编写函数 # nopython = True 选项要求完全编译该函数(以便完全删除Python解释器调用),否则会引发异常 @jit(nopython=True) # jit,numba装饰器中的一种 def go_fast2(a): # 首次调用时,函数被编译为机器代码 trace = 0 # 假设输入变量是numpy数组 for i in range(a.shape[0]): # Numba 擅长处理循环 trace += np.tanh(a[i, i]) # numba喜欢numpy函数 return a + trace # numba喜欢numpy广播 (3)给函数传递实参 # 因为函数要求传入的参数是nunpy数组 x = np.arange(100).reshape(10, 10) # 执行函数 go_fast(x) (4)经numba加速的函数执行时间 % timeit go_fast(x) (5)结果输出 快了40倍。 2.nunba对for循环的加速 (1)代码 # 普通函数 def go_fast1(): # 首次调用时

Numba code slower than pure python

时光总嘲笑我的痴心妄想 提交于 2019-12-02 18:23:17
I've been working on speeding up a resampling calculation for a particle filter. As python has many ways to speed it up, I though I'd try them all. Unfortunately, the numba version is incredibly slow. As Numba should result in a speed up, I assume this is an error on my part. I tried 4 different versions: Numba Python Numpy Cython The code for each is below: import numpy as np import scipy as sp import numba as nb from cython_resample import cython_resample @nb.autojit def numba_resample(qs, xs, rands): n = qs.shape[0] lookup = np.cumsum(qs) results = np.empty(n) for j in range(n): for i in

Python: rewrite a looping numpy math function to run on GPU

浪尽此生 提交于 2019-12-02 17:36:30
Can someone help me rewrite this one function (the doTheMath function) to do the calculations on the GPU? I used a few good days now trying to get my head around it but to no result. I wonder maybe somebody can help me rewrite this function in whatever way you may seem fit as log as I gives the same result at the end. I tried to use @jit from numba but for some reason it is actually much slower than running the code as usual. With a huge sample size, the goal is to decrease the execution time considerably so naturally I believe the GPU is the fastest way to do it. I'll explain a little what is

Muting the LLVM IR debug output when using Numba?

谁说胖子不能爱 提交于 2019-12-02 06:20:35
问题 I am wanting to use Numba in one of our in-house client libraries, however there's a debug dump of the LLVM IR code every time my code JITs something. Is there a setting in Numba or in LLVM that I can change so as to mute this stuff: http://i.imgur.com/Vkankxe.png ? Thank you. 回答1: If you want to stay with the release version of numba 0.11, and you can't control the python optimization level, this will work (just tried it myself): import logging def disableNumbaLogging(): import numba.codegen

Is it expected for numba's efficient square euclidean distance code to be slower than numpy's efficient counterpart?

痞子三分冷 提交于 2019-12-02 05:47:17
I modify the most efficient code from ( Why this numba code is 6x slower than numpy code? ) so that it can handle x1 being (n, m) @nb.njit(fastmath=True,parallel=True) def euclidean_distance_square_numba_v5(x1, x2): res = np.empty((x1.shape[0], x2.shape[0]), dtype=x2.dtype) for a_idx in nb.prange(x1.shape[0]): for o_idx in range(x2.shape[0]): val = 0. for i_idx in range(x2.shape[1]): tmp = x1[a_idx, i_idx] - x2[o_idx, i_idx] val += tmp * tmp res[a_idx, o_idx] = val return res However, it is still not more efficient that the more efficient numpy's version: def euclidean_distance_square_einsum

The Anaconda prompt freezes when I run code with numba's “jit” decorator

痞子三分冷 提交于 2019-12-02 04:27:23
I have this python code that should run just fine. I'm running it on Anaconda's Spyder Ipython console, or on the Anaconda terminal itself, because that is the only way I can use the "numba" library and its "jit" decorator. However, either one always "freezes" or "hangs" just about whenever I run it. There is nothing wrong with the code itself, or else I'd get an error. Sometimes, the code runs all the way through perfectly fine, sometimes it just prints the first line from the first function, and sometimes the code stops anywhere in the middle. I've tried seeing under which conditions the

Muting the LLVM IR debug output when using Numba?

谁说胖子不能爱 提交于 2019-12-02 01:22:50
I am wanting to use Numba in one of our in-house client libraries, however there's a debug dump of the LLVM IR code every time my code JITs something. Is there a setting in Numba or in LLVM that I can change so as to mute this stuff: http://i.imgur.com/Vkankxe.png ? Thank you. If you want to stay with the release version of numba 0.11, and you can't control the python optimization level, this will work (just tried it myself): import logging def disableNumbaLogging(): import numba.codegen.debug llvmlogger = logging.getLogger('numba.codegen.debug') llvmlogger.setLevel(logging.INFO) Try invoking

How to Solve Numba Lowering error?

会有一股神秘感。 提交于 2019-12-02 00:34:16
问题 I have a function, which I am trying to speed up using the @jit decorator from Numba module. For me it is essential to speed this up as much as possible, because my main code calls upon this function for millions of times. Here is my function: from numba import jit, types import Sweep #My own module, works fine @jit(types.Tuple((types.complex128[:], types.float64[:]))(types.complex128[:], types.complex128[:], types.float64[:], types.float64[:], types.float64)) def MultiModeSL(Ef, Ef2, Nf, u,

Python: can numba work with arrays of strings in nopython mode?

五迷三道 提交于 2019-12-01 17:58:36
I am using pandas 0.16.2, numpy 1.9.2 and numba 0.20. Is there any way to get numba to support arrays of strings in nopython mode? Alternatively, could I somehow convert strings to numbers which numba would recognise? I have to run certain loops on an array of strings (a column from a pandas dataframe); if I could use numba the code would be substantially faster. I have come up with this minimal example to show what I mean: import numpy as np import numba x=np.array(['some','text','this','is']) @numba.jit(nopython=True) def numba_str(txt): x=0 for i in xrange(txt.size): if txt[i]=='text': x +=