numpy

Find most frequent row or mode of a matrix of vectors - Python / NumPy

懵懂的女人 提交于 2021-02-20 19:32:53
问题 I have a numpy array of shape (?,n) that represents a vector of n-dimensional vectors. I want to find the most frequent row. So far it seems that the best way is to just iterate over all the entries and store a count, but it seems obscene that numpy or scipy wouldn't have something builtin to perform this task. 回答1: Here's an approach using NumPy views , which should be pretty efficient - def mode_rows(a): a = np.ascontiguousarray(a) void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a

缺失值处理

佐手、 提交于 2021-02-20 16:46:40
1. 数据缺失分为两种:行记录缺失,列记录缺失。 2. 不同的数据存储和环境对缺失值的表示也不同。例如:数据库中是Null,Python是None,Pandas或Numpy是NaN。 3. 对缺失值的处理通常4种方法: (1). 丢弃 下面两种场景不宜采用该方法: 不完整数据比例较大,超过10% 缺失值存在明显的数据分布规律或特征 (2). 补全 常用补全方法: 统计法:对于 数值型 的数据,使用均值、加权均值、中位数等方法补足;对于 分类型 数据,使用类别众数最多的值补足。 模型法:基于已有的其他字段,将缺失字段作为目标变量进行预测,从而得到较为可能的补全值。如果带有缺失值的列是 数值 变量,采用回归模型补全;如果是 分类 变量,则采用分类模型补全。 专家补全:少量且具有重要意义的数据记录,专家补足也是非常重要的一种途径。 其他方法:随机发、特殊值法、多重填补等 (3). 真值转换法 (4). 不处理 常见能够自动处理缺失值模型包括:KNN、决策树和随机森林、神经网络和朴素贝叶斯、DBSCAN(基于密度的带有噪声的空间聚类)等。 处理思路: 忽略 ,缺失值不参与距离计算,例如:KNN。 将缺失值 作为分布的一种状态 ,并参与到建模过程,例如:决策树以及变体。 不基于距离做计算 ,因此基于值得距离计算本身的影响就消除了,例如:DBSCAN。 4. 对于缺失值的处理上,主要配合使用

How to resize the window obtained from cv2.imshow()?

懵懂的女人 提交于 2021-02-20 04:15:26
问题 I started learning OpenCV today and I wrote a short code to upload (I don't know, if it's the right term) a random image: 1 It works fine, and I can open the image, but what I get is a big window and I can't see the full image unless I scroll it: 2 So, I'd like to know a way that I could see the whole image pretty and fine in a shorter window. 回答1: You can resize the image keeping the aspect ratio same and display it. #Display image def display(img, frameName="OpenCV Image"): h, w = img.shape

Python Error: Using matplotlib: Truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我与影子孤独终老i 提交于 2021-02-20 02:36:45
问题 So I am trying to plot a graph of 3 of my functions for a beam on one graph using the matplotlib module and am getting value errors when attempting to do so. The main bulk of code is: class beam(object): '''This class is models the deflection of a simply supported beam under multiple point loads, following Euler-Bernoulli theory and the principle of superposition ''' def __init__(self, E, I, L): '''The class costructor ''' self.E = E # Young's modulus of the beam in N/m^2 self.I = I # Second

Python Error: Using matplotlib: Truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我们两清 提交于 2021-02-20 02:35:13
问题 So I am trying to plot a graph of 3 of my functions for a beam on one graph using the matplotlib module and am getting value errors when attempting to do so. The main bulk of code is: class beam(object): '''This class is models the deflection of a simply supported beam under multiple point loads, following Euler-Bernoulli theory and the principle of superposition ''' def __init__(self, E, I, L): '''The class costructor ''' self.E = E # Young's modulus of the beam in N/m^2 self.I = I # Second

Interpolation to evenly space trajectory data for different curves

五迷三道 提交于 2021-02-20 02:28:16
问题 I am using the following code (adapted from Resample or normalize trajectory data so points are evenly spaced) to interpolate 2D X & Y positional data (with no time index) so that the points are evenly spaced. From my understanding, the answer for that question assumed that the x values follow a certain curve or pattern (e.g. exponential curve) but that isn't the case for all my trajectories. I believe I need to interpolate X and Y separately. However, this code does not seem to produce

python numpy arange: strange behavior [duplicate]

爷,独闯天下 提交于 2021-02-20 01:34:46
问题 This question already has answers here : Is floating point math broken? (31 answers) python numpy arange unexpected results (4 answers) Closed 2 years ago . Python 2.7.9 (default, Jun 29 2016, 13:08:31) IPython 5.6.0 -- An enhanced Interactive Python. In [1]: import numpy as np In [2]: np.__version__ Out[2]: '1.14.3' In [3]: np.arange(1.1, 1.12, 0.01) Out[3]: array([1.1 , 1.11, 1.12]) In [4]: np.arange(1.1, 1.13, 0.01) Out[4]: array([1.1 , 1.11, 1.12]) In both cases, the array gets to 1.12...

Python: Sample from multivariate normal with N means and same covariance matrix

白昼怎懂夜的黑 提交于 2021-02-19 08:14:11
问题 Suppose I want to sample 10 times from multiple normal distributions with the same covariance matrix (identity) but different means, which are stored as rows of the following matrix: means = np.array([[1, 5, 2], [6, 2, 7], [1, 8, 2]]) How can I do that in the most efficient way possible (i.e. avoiding loops) I tried like this: scipy.stats.multivariate_normal(means, np.eye(2)).rvs(10) and np.random.multivariate_normal(means, np.eye(2)) But they throw an error saying mean should be 1D. Slow

How can I shuffle a very large list stored in a file in Python?

纵然是瞬间 提交于 2021-02-19 08:11:48
问题 I need to deterministically generate a randomized list containing the numbers from 0 to 2^32-1. This would be the naive (and totally nonfunctional) way of doing it, just so it's clear what I'm wanting. import random numbers = range(2**32) random.seed(0) random.shuffle(numbers) I've tried making the list with numpy.arange() and using pycrypto's random.shuffle() to shuffle it. Making the list ate up about 8gb of ram, then shuffling raised that to around 25gb. I only have 32gb to give. But that

Why I get different result for inbuilt and defined FFT in python?

不想你离开。 提交于 2021-02-19 07:58:07
问题 I have the code below for fft2 performed by numpy and a 2d fft performed by direct code. an anyone point out why they are different? My inputmatreix is rA. def DFT_matrix(N): i, j = np.meshgrid(np.arange(N), np.arange(N)) omega = np.exp( - 2 * math.pi * 1J / N ) W = np.power( omega, i * j ) / np.sqrt(N) return W sizeM=40 rA=np.random.rand(sizeM,sizeM) rAfft=np.fft.fft2(rA) rAfftabs=np.abs(rAfft)+1e-9 dftMtx=DFT_matrix(sizeM) dftR=dftMtx.conj().T mA=dftMtx*rA*dftR mAabs=np.abs(mA)+1e-9 print