pandas-groupby

Aggregation in pandas

徘徊边缘 提交于 2019-11-26 00:43:59
问题 How to perform aggregation with pandas? No DataFrame after aggregation! What happened? How to aggregate mainly strings columns (to list s, tuple s, strings with separator )? How to aggregate counts? How to create new column filled by aggregated values? I\'ve seen these recurring questions asking about various faces of the pandas aggregate functionality. Most of the information regarding aggregation and its various use cases today is fragmented across dozens of badly worded, unsearchable posts

Converting a Pandas GroupBy output from Series to DataFrame

若如初见. 提交于 2019-11-26 00:23:17
问题 I\'m starting with input data like this df1 = pandas.DataFrame( { \"Name\" : [\"Alice\", \"Bob\", \"Mallory\", \"Mallory\", \"Bob\" , \"Mallory\"] , \"City\" : [\"Seattle\", \"Seattle\", \"Portland\", \"Seattle\", \"Seattle\", \"Portland\"] } ) Which when printed appears as this: City Name 0 Seattle Alice 1 Seattle Bob 2 Portland Mallory 3 Seattle Mallory 4 Seattle Bob 5 Portland Mallory Grouping is simple enough: g1 = df1.groupby( [ \"Name\", \"City\"] ).count() and printing yields a GroupBy

How do I create a new column from the output of pandas groupby().sum()?

牧云@^-^@ 提交于 2019-11-26 00:04:28
问题 Trying to create a new column from the groupby calculation. In the code below, I get the correct calculated values for each date (see group below) but when I try to create a new column ( df[\'Data4\'] ) with it I get NaN. So I am trying to create a new column in the dataframe with the sum of Data3 for the all dates and apply that to each date row. For example, 2015-05-08 is in 2 rows (total is 50+5 = 55) and in this new column I would like to have 55 in both of the rows. import pandas as pd

Pandas GroupBy.apply method duplicates first group

北战南征 提交于 2019-11-25 23:41:11
问题 My first SO question: I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example: >>> from pandas import Series, DataFrame >>> import pandas as pd >>> df = pd.DataFrame({\'class\': [\'A\', \'B\', \'C\'], \'count\':[1,0,2]}) >>> print(df) class count 0 A 1 1 B 0 2 C 2 I first check that the groupby function works ok, and it seems to be fine: >>> for group in df.groupby(\'class\', group

GroupBy pandas DataFrame and select most common value

南楼画角 提交于 2019-11-25 23:31:31
问题 I have a data frame with three string columns. I know that the only one value in the 3rd column is valid for every combination of the first two. To clean the data I have to group by data frame by first two columns and select most common value of the third column for each combination. My code: import pandas as pd from scipy import stats source = pd.DataFrame({\'Country\' : [\'USA\', \'USA\', \'Russia\',\'USA\'], \'City\' : [\'New-York\', \'New-York\', \'Sankt-Petersburg\', \'New-York\'], \

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

纵饮孤独 提交于 2019-11-25 22:56:48
问题 I have a data frame df and I use several columns from it to groupby : df[\'col1\',\'col2\',\'col3\',\'col4\'].groupby([\'col1\',\'col2\']).mean() In the above way I almost get the table (data frame) that I need. What is missing is an additional column that contains number of rows in each group. In other words, I have mean but I also would like to know how many number were used to get these means. For example in the first group there are 8 values and in the second one 10 and so on. In short:

Get the Row(s) which have the max value in groups using groupby

纵然是瞬间 提交于 2019-11-25 22:16:02
问题 How do I find all rows in a pandas dataframe which have the max value for count column, after grouping by [\'Sp\',\'Mt\'] columns? Example 1: the following dataFrame, which I group by [\'Sp\',\'Mt\'] : Sp Mt Value count 0 MM1 S1 a **3** 1 MM1 S1 n 2 2 MM1 S3 cb 5 3 MM2 S3 mk **8** 4 MM2 S4 bg **10** 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 2 8 MM4 S2 uyi **7** Expected output: get the result rows whose count is max between the groups, like: 0 MM1 S1 a **3** 1 3 MM2 S3 mk **8** 4 MM2 S4 bg *

grouping rows in list in pandas groupby

萝らか妹 提交于 2019-11-25 21:47:35
问题 I have a pandas data frame like: a b A 1 A 2 B 5 B 5 B 4 C 6 I want to group by the first column and get second column as lists in rows: A [1,2] B [5,5,4] C [6] Is it possible to do something like this using pandas groupby? 回答1: You can do this using groupby to group on the column of interest and then apply list to every group: In [1]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]}) df Out[1]: a b 0 A 1 1 A 2 2 B 5 3 B 5 4 B 4 5 C 6 In [2]: df.groupby('a')['b'].apply

How to pivot a dataframe

落爺英雄遲暮 提交于 2019-11-25 21:30:13
问题 What is pivot? How do I pivot? Is this a pivot? Long format to wide format? I\'ve seen a lot of questions that ask about pivot tables. Even if they don\'t know that they are asking about pivot tables, they usually are. It is virtually impossible to write a canonical question and answer that encompasses all aspects of pivoting.... ... But I\'m going to give it a go. The problem with existing questions and answers is that often the question is focused on a nuance that the OP has trouble