numpy | 易学教程

Diagonal stacking in numpy?

阅读更多关于 Diagonal stacking in numpy?

问题 So numpy has some convenience functions for combining several arrays into one, e.g. hstack and vstack. I'm wondering if there's something similar but for stacking the component arrays diagonally? Say I have N arrays of shape (n_i, m_i), and I want to combine them into a single array of size (sum_{1,N}n_i, sum_{1,N}m_i) such that the component arrays form blocks on the diagonal of the result array. And yes, I know how to solve it manually, e.g. with the approach described in How to "embed" a

Numpy Where Changing Timestamps/Datetime to Integers

阅读更多关于 Numpy Where Changing Timestamps/Datetime to Integers

问题 Not so much a question but something puzzling me. I have a column of dates that looks something like this: 0 NaT 1 1996-04-01 2 2000-03-01 3 NaT 4 NaT 5 NaT 6 NaT 7 NaT 8 NaT I'd like to convert it the NaTs to a static value. (Assume I imported pandas as pd and numpy as np). If I do: mydata['mynewdate'] = mydata.mydate.replace( np.NaN, pd.datetime(1994,6,30,0,0)) All is well, I get: 0 1994-06-30 1 1996-04-01 2 2000-03-01 3 1994-06-30 4 1994-06-30 5 1994-06-30 6 1994-06-30 7 1994-06-30 8 1994

Numpy Where Changing Timestamps/Datetime to Integers

阅读更多关于 Numpy Where Changing Timestamps/Datetime to Integers

get_dummies(), Exception: Data must be 1-dimensional

阅读更多关于 get_dummies(), Exception: Data must be 1-dimensional

问题 I have this data I am trying to apply this: one_hot = pd.get_dummies(df) But I get this error: Here is my code up until then: # Import modules import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn import tree df = pd.read_csv('AllMSAData.csv') df.head() corr_matrix = df.corr() corr_matrix df.describe() # Get featurs and targets labels = np.array(df['CurAV']) # Remove the labels from the features # axis 1 refers to the columns df = df.drop('CurAV', axis = 1) #

Preventing a multiplication expression evaluating in Sympy

阅读更多关于 Preventing a multiplication expression evaluating in Sympy

问题 I am generating an expression with two fractions, and want to pretty print as a whole expression with LaTeX, to then put on a worksheet. E.g. in the form: (5/7) * (3/4). However, when I do the following: fract1 = sympy.sympify(Fraction(5,7)) fract2 = sympy.sympify(Fraction(3,4)) expression = sympy.Mul(fract1,fract2,evaluate=False) It returns 5*3/(7*4) Clearly it is combining the fraction but not actually evaluating, but I want to be able to produce it in a format suitable as a question for a

How to interleave numpy.ndarrays?

阅读更多关于 How to interleave numpy.ndarrays?

问题 I am currently looking for method in which i can interleave 2 numpy.ndarray. such that >>> a = np.random.rand(5,5) >>> print a [[ 0.83367208 0.29507876 0.41849799 0.58342521 0.81810562] [ 0.31363351 0.69468009 0.14960363 0.7685722 0.56240711] [ 0.49368821 0.46409791 0.09042236 0.68706312 0.98430387] [ 0.21816242 0.87907115 0.49534121 0.60453302 0.75152033] [ 0.10510938 0.55387841 0.37992348 0.6754701 0.27095986]] >>> b = np.random.rand(5,5) >>> print b [[ 0.52237011 0.75242666 0.39895415 0

how do you find and save duplicated rows in a numpy array?

阅读更多关于 how do you find and save duplicated rows in a numpy array?

问题 I have an array e.g. Array = [[1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5],[1,1,1],[2,2,2]] And i would like something that would output the following: Repeated = [[1,1,1],[2,2,2]] Preserving the number of repeated rows would work too, e.g. Repeated = [[1,1,1],[1,1,1],[2,2,2],[2,2,2]] I thought the solution might include numpy.unique, but i can't get it to work, is there a native python / numpy function? 回答1: Using the new axis functionality of np.unique alongwith return_counts=True that gives us

how do you find and save duplicated rows in a numpy array?

阅读更多关于 how do you find and save duplicated rows in a numpy array?

how do you find and save duplicated rows in a numpy array?

阅读更多关于 how do you find and save duplicated rows in a numpy array?

Getting the parameter names of scipy.stats distributions

阅读更多关于 Getting the parameter names of scipy.stats distributions

问题 I am writing a script to find the best-fitting distribution over a dataset using scipy.stats. I first have a list of distribution names, over which I iterate: dists = ['alpha', 'anglit', 'arcsine', 'beta', 'betaprime', 'bradford', 'norm'] for d in dists: dist = getattr(scipy.stats, d) ps = dist.fit(selected_data) errors.loc[d,['D-Value','P-Value']] = kstest(selected.tolist(), d, args=ps) errors.loc[d,'Params'] = ps Now, after this loop, I select the minimum D-Value in order to get the best