numpy

Iterate each row by updating values from 1st dataframe to 2nd dataframe based on unique value w/ different index, otherwise append and assign new ID

六月ゝ 毕业季﹏ 提交于 2021-02-10 07:33:17
问题 Trying to update each row from df1 to df2 if an unique value is matched. If not, append the row to df2 and assign new ID column. df1 ( NO ID COLUMN ): unique_value Status Price 0 xyz123 bad 6.67 1 eff987 bad 1.75 2 efg125 okay 5.77 df2: unique_value Status Price ID 0 xyz123 good 1.25 1000 1 xyz123 good 1.25 1000 2 xyz123 good 1.25 1000 3 xyz123 good 1.25 1000 4 xyz985 bad 1.31 1001 5 abc987 okay 4.56 1002 6 eff987 good 9.85 1003 7 asd541 excellent 8.85 1004 Desired output for updated df2:

Difference between numpy var() and pandas var()

旧城冷巷雨未停 提交于 2021-02-10 07:32:51
问题 I recently encountered a thing which made me notice that numpy.var() and pandas.DataFrame.var() or pandas.Series.var() are giving different values. I want to know if there is any difference between them or not? Here is my dataset. Country GDP Area Continent 0 India 2.79 3.287 Asia 1 USA 20.54 9.840 North America 2 China 13.61 9.590 Asia Here is my code: from sklearn.preprocessing import StandardScaler ss = StandardScaler() catDf.iloc[:,1:-1] = ss.fit_transform(catDf.iloc[:,1:-1]) Now checking

Increasing performance of nearest neighbors of rows in Pandas

白昼怎懂夜的黑 提交于 2021-02-10 07:26:29
问题 I am given 8000x3 data set similar to this one: import pandas as pd import numpy as np df = pd.DataFrame(np.random.rand(8000,3), columns=list('XYZ')) So for a visual reference, df.head(5) looks like this: X Y Z 0 0.462433 0.559442 0.016778 1 0.663771 0.092044 0.636519 2 0.111489 0.676621 0.839845 3 0.244361 0.599264 0.505175 4 0.115844 0.888622 0.766014 I'm trying to implement a method that when given an index from the dataset, it will return similar items from the dataset (in some reasonable

Finding start time and end time in a column

99封情书 提交于 2021-02-10 07:11:12
问题 I have a data set that has employees clocking in and out. It looks like this (note two entries per employee): Employee Date Time Emp1 1/1/16 06:00 Emp1 1/1/16 13:00 Emp2 1/1/16 09:00 Emp2 1/1/16 17:00 Emp3 1/1/16 11:00 Emp3 1/1/16 18:00 I want to get the data to look like this: Employee Date Start End Emp1 1/1/16 06:00 13:00 Emp2 1/1/16 09:00 17:00 Emp3 1/1/16 11:00 18:00 I would like to get it into a data frame format so that I can do some calculations. I currently have tried df['start'] =

Finding start time and end time in a column

天大地大妈咪最大 提交于 2021-02-10 07:06:12
问题 I have a data set that has employees clocking in and out. It looks like this (note two entries per employee): Employee Date Time Emp1 1/1/16 06:00 Emp1 1/1/16 13:00 Emp2 1/1/16 09:00 Emp2 1/1/16 17:00 Emp3 1/1/16 11:00 Emp3 1/1/16 18:00 I want to get the data to look like this: Employee Date Start End Emp1 1/1/16 06:00 13:00 Emp2 1/1/16 09:00 17:00 Emp3 1/1/16 11:00 18:00 I would like to get it into a data frame format so that I can do some calculations. I currently have tried df['start'] =

Removing “nan” values from a numpy array

孤人 提交于 2021-02-10 06:55:15
问题 I have a numpy array that has certain rows filled exclusively with "nan", i.e.: print(ar2[1560]) [ nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan

A more compact __repr__ for my numpy array?

狂风中的少年 提交于 2021-02-10 06:39:45
问题 When I show an array, the default __repr__() method for ndarray objects is too big for what I would like to do: a = np.eye(32) b = {'hello':42, 'array':a} b produces: {'array': array([[ 1., 0., 0., ..., 0., 0., 0.], [ 0., 1., 0., ..., 0., 0., 0.], [ 0., 0., 1., ..., 0., 0., 0.], ..., [ 0., 0., 0., ..., 1., 0., 0.], [ 0., 0., 0., ..., 0., 1., 0.], [ 0., 0., 0., ..., 0., 0., 1.]]), 'hello': 42} I tried an ugly solution, reassigning __repr__ : def wow(): return "wow!" a.__repr__ = wow which

Numpy symmetric 4D matrix construction

感情迁移 提交于 2021-02-10 06:38:15
问题 I would like to construct an array with the following structure: A[i,j,i,j,] = B[i,j] with all other entries 0: A[i,j,l,k]=0 # (i,j) =\= (l,k) I.e. if I have the B matrix constructed how can I create the matrix A , preferably in a vectorized manner. Explicitly, let B = [[1,2],[3,4]] Then: A[1,1,:,:]=[[1,0],[0,0]] A[1,2,:,:]=[[0,2],[0,0]] A[2,1,:,:]=[[0,0],[3,0]] A[2,2,:,:]=[[0,0],[0,4]] 回答1: We can use an open grid to assign to A broadcasting the indexing arrays across the axes: B = np.array(

Numpy symmetric 4D matrix construction

廉价感情. 提交于 2021-02-10 06:37:07
问题 I would like to construct an array with the following structure: A[i,j,i,j,] = B[i,j] with all other entries 0: A[i,j,l,k]=0 # (i,j) =\= (l,k) I.e. if I have the B matrix constructed how can I create the matrix A , preferably in a vectorized manner. Explicitly, let B = [[1,2],[3,4]] Then: A[1,1,:,:]=[[1,0],[0,0]] A[1,2,:,:]=[[0,2],[0,0]] A[2,1,:,:]=[[0,0],[3,0]] A[2,2,:,:]=[[0,0],[0,4]] 回答1: We can use an open grid to assign to A broadcasting the indexing arrays across the axes: B = np.array(

ValueError: could not broadcast input array from shape (20,590) into shape (20)

一个人想着一个人 提交于 2021-02-10 06:37:07
问题 I am trying to extract features from .wav files by using MFCC's of the sound files. I am getting an error when I try to convert my list of MFCC's to a numpy array. I am quite sure that this error is occurring because the list contains MFCC values with different shapes (But am unsure of how to solve the issue). I have looked at 2 other stackoverflow posts, however these don't solve my problem because they are too specific to a certain task. ValueError: could not broadcast input array from