numpy

Diagonal stacking in numpy?

风流意气都作罢 提交于 2021-02-08 15:30:19
问题 So numpy has some convenience functions for combining several arrays into one, e.g. hstack and vstack. I'm wondering if there's something similar but for stacking the component arrays diagonally? Say I have N arrays of shape (n_i, m_i), and I want to combine them into a single array of size (sum_{1,N}n_i, sum_{1,N}m_i) such that the component arrays form blocks on the diagonal of the result array. And yes, I know how to solve it manually, e.g. with the approach described in How to "embed" a

Numpy Where Changing Timestamps/Datetime to Integers

谁都会走 提交于 2021-02-08 15:22:22
问题 Not so much a question but something puzzling me. I have a column of dates that looks something like this: 0 NaT 1 1996-04-01 2 2000-03-01 3 NaT 4 NaT 5 NaT 6 NaT 7 NaT 8 NaT I'd like to convert it the NaTs to a static value. (Assume I imported pandas as pd and numpy as np). If I do: mydata['mynewdate'] = mydata.mydate.replace( np.NaN, pd.datetime(1994,6,30,0,0)) All is well, I get: 0 1994-06-30 1 1996-04-01 2 2000-03-01 3 1994-06-30 4 1994-06-30 5 1994-06-30 6 1994-06-30 7 1994-06-30 8 1994

Numpy Where Changing Timestamps/Datetime to Integers

∥☆過路亽.° 提交于 2021-02-08 15:20:08
问题 Not so much a question but something puzzling me. I have a column of dates that looks something like this: 0 NaT 1 1996-04-01 2 2000-03-01 3 NaT 4 NaT 5 NaT 6 NaT 7 NaT 8 NaT I'd like to convert it the NaTs to a static value. (Assume I imported pandas as pd and numpy as np). If I do: mydata['mynewdate'] = mydata.mydate.replace( np.NaN, pd.datetime(1994,6,30,0,0)) All is well, I get: 0 1994-06-30 1 1996-04-01 2 2000-03-01 3 1994-06-30 4 1994-06-30 5 1994-06-30 6 1994-06-30 7 1994-06-30 8 1994

get_dummies(), Exception: Data must be 1-dimensional

◇◆丶佛笑我妖孽 提交于 2021-02-08 15:11:19
问题 I have this data I am trying to apply this: one_hot = pd.get_dummies(df) But I get this error: Here is my code up until then: # Import modules import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn import tree df = pd.read_csv('AllMSAData.csv') df.head() corr_matrix = df.corr() corr_matrix df.describe() # Get featurs and targets labels = np.array(df['CurAV']) # Remove the labels from the features # axis 1 refers to the columns df = df.drop('CurAV', axis = 1) #

Preventing a multiplication expression evaluating in Sympy

我是研究僧i 提交于 2021-02-08 15:07:51
问题 I am generating an expression with two fractions, and want to pretty print as a whole expression with LaTeX, to then put on a worksheet. E.g. in the form: (5/7) * (3/4). However, when I do the following: fract1 = sympy.sympify(Fraction(5,7)) fract2 = sympy.sympify(Fraction(3,4)) expression = sympy.Mul(fract1,fract2,evaluate=False) It returns 5*3/(7*4) Clearly it is combining the fraction but not actually evaluating, but I want to be able to produce it in a format suitable as a question for a

How to interleave numpy.ndarrays?

一世执手 提交于 2021-02-08 14:29:10
问题 I am currently looking for method in which i can interleave 2 numpy.ndarray. such that >>> a = np.random.rand(5,5) >>> print a [[ 0.83367208 0.29507876 0.41849799 0.58342521 0.81810562] [ 0.31363351 0.69468009 0.14960363 0.7685722 0.56240711] [ 0.49368821 0.46409791 0.09042236 0.68706312 0.98430387] [ 0.21816242 0.87907115 0.49534121 0.60453302 0.75152033] [ 0.10510938 0.55387841 0.37992348 0.6754701 0.27095986]] >>> b = np.random.rand(5,5) >>> print b [[ 0.52237011 0.75242666 0.39895415 0

how do you find and save duplicated rows in a numpy array?

喜你入骨 提交于 2021-02-08 14:10:18
问题 I have an array e.g. Array = [[1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5],[1,1,1],[2,2,2]] And i would like something that would output the following: Repeated = [[1,1,1],[2,2,2]] Preserving the number of repeated rows would work too, e.g. Repeated = [[1,1,1],[1,1,1],[2,2,2],[2,2,2]] I thought the solution might include numpy.unique, but i can't get it to work, is there a native python / numpy function? 回答1: Using the new axis functionality of np.unique alongwith return_counts=True that gives us

how do you find and save duplicated rows in a numpy array?

泪湿孤枕 提交于 2021-02-08 14:08:32
问题 I have an array e.g. Array = [[1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5],[1,1,1],[2,2,2]] And i would like something that would output the following: Repeated = [[1,1,1],[2,2,2]] Preserving the number of repeated rows would work too, e.g. Repeated = [[1,1,1],[1,1,1],[2,2,2],[2,2,2]] I thought the solution might include numpy.unique, but i can't get it to work, is there a native python / numpy function? 回答1: Using the new axis functionality of np.unique alongwith return_counts=True that gives us

how do you find and save duplicated rows in a numpy array?

我的未来我决定 提交于 2021-02-08 14:08:29
问题 I have an array e.g. Array = [[1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5],[1,1,1],[2,2,2]] And i would like something that would output the following: Repeated = [[1,1,1],[2,2,2]] Preserving the number of repeated rows would work too, e.g. Repeated = [[1,1,1],[1,1,1],[2,2,2],[2,2,2]] I thought the solution might include numpy.unique, but i can't get it to work, is there a native python / numpy function? 回答1: Using the new axis functionality of np.unique alongwith return_counts=True that gives us

Getting the parameter names of scipy.stats distributions

无人久伴 提交于 2021-02-08 14:07:44
问题 I am writing a script to find the best-fitting distribution over a dataset using scipy.stats. I first have a list of distribution names, over which I iterate: dists = ['alpha', 'anglit', 'arcsine', 'beta', 'betaprime', 'bradford', 'norm'] for d in dists: dist = getattr(scipy.stats, d) ps = dist.fit(selected_data) errors.loc[d,['D-Value','P-Value']] = kstest(selected.tolist(), d, args=ps) errors.loc[d,'Params'] = ps Now, after this loop, I select the minimum D-Value in order to get the best