numpy

Numpy Convert String Representation of Boolean Array To Boolean Array

故事扮演 提交于 2021-02-07 12:58:25
问题 Is there a native numpy way to convert an array of string representations of booleans eg: ['True','False','True','False'] To an actual boolean array I can use for masking/indexing? I could do a for loop going through and rebuilding the array but for large arrays this is slow. 回答1: You should be able to do a boolean comparison, IIUC, whether the dtype is a string or object : >>> a = np.array(['True', 'False', 'True', 'False']) >>> a array(['True', 'False', 'True', 'False'], dtype='|S5') >>> a

How to get the two smallest values from a numpy array

烈酒焚心 提交于 2021-02-07 12:57:18
问题 I would like to take the two smallest values from an array x . But when I use np.where : A,B = np.where(x == x.min())[0:1] I get this error: ValueError: need more than 1 value to unpack How can I fix this error? And do I need to arange numbers in ascending order in array? 回答1: You can use numpy.partition to get the lowest k+1 items: A, B = np.partition(x, 1)[0:2] # k=1, so the first two are the smallest items In Python 3.x you could also use: A, B, *_ = np.partition(x, 1) For example: import

pandas get unique values from column of lists

一曲冷凌霜 提交于 2021-02-07 12:37:41
问题 How do I get the unique values of a column of lists in pandas or numpy such that second column from would result in 'action', 'crime', 'drama' . The closest (but non-functional) solutions I could come up with were: genres = data['Genre'].unique() But this predictably results in a TypeError saying how lists aren't hashable. TypeError: unhashable type: 'list' Set seemed to be a good idea but genres = data.apply(set(), columns=['Genre'], axis=1) but also results in a TypeError: set() takes no

In Python 3.6, why does a negative number to the power of a fraction return nan when in a numpy array?

不打扰是莪最后的温柔 提交于 2021-02-07 12:32:33
问题 I have started learning Python recently and I've been going through the NumPy official quickstart guide which includes this example for iterating. >>> a array([-1000, 1, -1000, 27, -1000, 125, 216, 343, 512, 729]) >>> for i in a: ... print(i**(1/3.)) ... nan 1.0 nan 3.0 nan 5.0 6.0 7.0 8.0 9.0 However, if I just try to raise -1000 to the power of (1/3.) outside of the loop it returns a value. >>> -1000**(1/3.) -9.999999999999998 With parentheses around -1000 it also returns a value. >>> (

Scipy Sparse Cumsum

谁都会走 提交于 2021-02-07 11:51:47
问题 Suppose I have a scipy.sparse.csr_matrix representing the values below [[0 0 1 2 0 3 0 4] [1 0 0 2 0 3 4 0]] I want to calculate the cumulative sum of non-zero values in-place, which would change the array to: [[0 0 1 3 0 6 0 10] [1 0 0 3 0 6 10 0]] The actual values are not 1, 2, 3, ... The number of non-zero values in each row are unlikely to be the same. How to do this fast? Current program: import scipy.sparse import numpy as np # sparse data a = scipy.sparse.csr_matrix( [[0,0,1,2,0,3,0,4

numpy.core.multiarray failed to import

那年仲夏 提交于 2021-02-07 11:25:23
问题 I used the following command to know the numpy version I am using pip show numpy output shown below --- Name: numpy Version: 1.8.2 Location: /usr/lib/python2.7/dist-packages Requires: However when I am running matplotlib, I got a error as RuntimeError: module compiled against API version a but this version of numpy is 9 from matplotlib import pyplot as plt File "/usr/local/lib/python2.7/dist-packages/matplotlib/pyplot.py", line 27, in <module> import matplotlib.colorbar File "/usr/local/lib

python: fastest way to compute euclidean distance of a vector to every row of a matrix?

左心房为你撑大大i 提交于 2021-02-07 10:53:47
问题 Consider this python code, where I try to compute the eucliean distance of a vector to every row of a matrix. It's very slow compared to the best Julia version I can find using Tullio.jl. The python version takes 30s but the Julia version only takes 75ms . I am sure I am not doing the best in Python. Are there faster solutions? Numba and numpy solutions welcome. import numpy as np # generate a = np.random.rand(4000000, 128) b = np.random.rand(128) print(a.shape) print(b.shape) def lin_norm

python: fastest way to compute euclidean distance of a vector to every row of a matrix?

六眼飞鱼酱① 提交于 2021-02-07 10:53:24
问题 Consider this python code, where I try to compute the eucliean distance of a vector to every row of a matrix. It's very slow compared to the best Julia version I can find using Tullio.jl. The python version takes 30s but the Julia version only takes 75ms . I am sure I am not doing the best in Python. Are there faster solutions? Numba and numpy solutions welcome. import numpy as np # generate a = np.random.rand(4000000, 128) b = np.random.rand(128) print(a.shape) print(b.shape) def lin_norm

numpy.tile did not work as Matlab repmat

故事扮演 提交于 2021-02-07 10:48:29
问题 According to What is the equivalent of MATLAB's repmat in NumPy, I tried to build 3x3x5 array from 3x3 array using python. In Matlab, this work as I expected. a = [1,1,1;1,2,1;1,1,1]; a_= repmat(a,[1,1,5]); size(a_) = 3 3 5 But for numpy.tile b = numpy.array([[1,1,1],[1,2,1],[1,1,1]]) b_ = numpy.tile(b, [1,1,5]) b_.shape = (1, 3, 15) If I want to generate the same array as in Matlab, what is the equivalent? Edit 1 The output I would expect to get is b_(:,:,1) = 1 1 1 1 2 1 1 1 1 b_(:,:,2) = 1

Filter numpy array of strings

允我心安 提交于 2021-02-07 10:45:26
问题 I have a very large data set gotten from twitter. I am trying to figure out how to do the equivalent of python filtering like the below in numpy. The environment is the python interpreter >>tweets = [['buhari si good'], ['atiku is great'], ['buhari nfd sdfa atiku'], ['is nice man that buhari']] >>>filter(lambda x: 'buhari' in x[0].lower(), tweets) [['buhari si good'], ['buhari nfd sdfa atiku'], ['is nice man that buhari']] I tried boolean indexing like the below, but the array turned up empty