pandas

Python Pandas - Only showing rows in DF for the MAX values of a column

六眼飞鱼酱① 提交于 2021-02-05 08:54:48
问题 searched for this, but cannot find an answer. Say I have a dataframe (apologies for formatting): a Dave $400 a Dave $400 a Dave $400 b Fred $220 c James $150 c James $150 d Harry $50 And I want to filter the dataframe so it only shows the rows where the third column is the MAXIMUM value, could someone point me in the right direction? i.e. it would only show Dave's rows All I can find is ways of showing the rows where its the maximum value for each separate index (the indexes being A, B, C etc

Why is pandas.melt messing with my dtypes?

风格不统一 提交于 2021-02-05 08:44:12
问题 I have some pivot code that is failing with the error pandas.core.base.DataError: No numeric types to aggregate I have tracked down the problem to a previous call to pandas.melt Here are the dtypes before the melt: frame.dtypes user_id Int64 feature object seconds_since_start_assigned Int32 total float32 programme_ids object q1 Int32 q2 Int32 q3 Int32 q4 Int32 q5 Int32 q6 Int32 q7 Int32 q8 Int32 q9 Int32 week Int32 Now for the melt frame1 = pd.melt( frame, id_vars=['user_id', 'week'], value

Selecting columns based on value given in other column using pandas in python

浪尽此生 提交于 2021-02-05 08:38:11
问题 I have a data frame as: a b c d...... 1 1 3 3 3 5 4 1 1 4 6 1 0 I want to select number of columns based on value given in column "a". In this case for first row it would only select column b. How can I achieve something like: df.iloc[:,column b:number of columns corresponding to value in column a] My expected output would be: a b c d e 1 1 0 0 1 # 'e' contains value in column b because colmn a = 1 3 3 3 5 335 # 'e' contains values of column b,c,d because colm a 4 1 1 4 1 # = 3 1 0 NAN 回答1: A

Sum two rows if two cells are the same but in different order

给你一囗甜甜゛ 提交于 2021-02-05 08:37:12
问题 Similar to below Buyer Seller Amount John Mary 3 Mary John 2 David Bosco 2 Where I want to sum John and Mary rows into one Expected out come Trade1 Trade2 Amount John Mary 5 David Bosco 2 My dataframe has around 6000 rows. Thank you for your help 回答1: First sort values by numpy.sort and create boolean mask by DataFrame.duplicated and then aggregate sum : df[['Buyer','Seller']] = pd.DataFrame(np.sort(df[['Buyer','Seller']], axis=1)) df2 = df.groupby(['Buyer','Seller'], as_index=False)['Amount'

Applying weighted average function to column in pandas groupby object, but weights sum to zero

喜夏-厌秋 提交于 2021-02-05 08:21:11
问题 I am applying different functions to each column in a pandas groupby object. One of these functions is a weighted average, where the weights are the associated values in another column in the DataFrame. However, for a number of my groups the weights sum to zero. Because of this, I get a "Weights sum to zero, can't be normalized" error message when I run the code. Referring to the code below, for the group defined by col1 value x and col2 value y, the sum of the values in col3 in rows with

Applying weighted average function to column in pandas groupby object, but weights sum to zero

ぐ巨炮叔叔 提交于 2021-02-05 08:20:50
问题 I am applying different functions to each column in a pandas groupby object. One of these functions is a weighted average, where the weights are the associated values in another column in the DataFrame. However, for a number of my groups the weights sum to zero. Because of this, I get a "Weights sum to zero, can't be normalized" error message when I run the code. Referring to the code below, for the group defined by col1 value x and col2 value y, the sum of the values in col3 in rows with

Reshaping a dataframe in python into 3D

▼魔方 西西 提交于 2021-02-05 08:20:07
问题 I am trying to reshape a handwritten character dataset into 3D form so that it can be concatenated with digit recognition dataset. I tried multiple times, but I couldnt figure out how it can be done. The actual digit recognition dataset has the shape (60000, 28, 28) The character recognition dataset has the shape (372450, 785) and the first column is target variable. Since excluding first column 28*28=784 there is a possibility that it can be converted to 3D same as digit dataset. Please

Taking the mean value of N last days

泪湿孤枕 提交于 2021-02-05 08:19:47
问题 I have this data frame: ID Date X 123_Var 456_Var 789_Var A 16-07-19 3 777 250 810 A 17-07-19 9 637 121 529 A 20-07-19 2 295 272 490 A 21-07-19 3 778 600 544 A 22-07-19 6 741 792 907 A 25-07-19 6 435 416 820 A 26-07-19 8 590 455 342 A 27-07-19 6 763 476 753 A 02-08-19 6 717 211 454 A 03-08-19 6 152 442 475 A 05-08-19 6 564 340 302 A 07-08-19 6 105 929 633 A 08-08-19 6 948 366 586 B 07-08-19 4 509 690 406 B 08-08-19 2 413 725 414 B 12-08-19 2 170 702 912 B 13-08-19 3 851 616 477 B 14-08-19 9

Pandas dataframe covert wide to long multiple columns with name from column Name

微笑、不失礼 提交于 2021-02-05 08:13:19
问题 Consider I have a Pandas Dataframe with the following format. Date Product cost|us|2019 cost|us|2020 cost|us|2021 cost|de|2019 cost|de|2020 cost|de|2021 01/01/2020 prodA 10 12 14 12 13 15 How can we convert it into the following format? Date Product Year cost|us cost|de 01/01/2020 ProdA 2019 10 12 01/01/2020 ProdA 2020 12 13 01/01/2020 ProdA 2021 14 15 回答1: Convert non year columns to MultiIndex by DataFrame.set_index, then use str.rsplit by columns by last | , set new column nmae in

Pandas dataframe covert wide to long multiple columns with name from column Name

痴心易碎 提交于 2021-02-05 08:13:11
问题 Consider I have a Pandas Dataframe with the following format. Date Product cost|us|2019 cost|us|2020 cost|us|2021 cost|de|2019 cost|de|2020 cost|de|2021 01/01/2020 prodA 10 12 14 12 13 15 How can we convert it into the following format? Date Product Year cost|us cost|de 01/01/2020 ProdA 2019 10 12 01/01/2020 ProdA 2020 12 13 01/01/2020 ProdA 2021 14 15 回答1: Convert non year columns to MultiIndex by DataFrame.set_index, then use str.rsplit by columns by last | , set new column nmae in