pandas-groupby | 易学教程

Aggregation in pandas

阅读更多关于 Aggregation in pandas

问题 How to perform aggregation with pandas? No DataFrame after aggregation! What happened? How to aggregate mainly strings columns (to list s, tuple s, strings with separator )? How to aggregate counts? How to create new column filled by aggregated values? I\'ve seen these recurring questions asking about various faces of the pandas aggregate functionality. Most of the information regarding aggregation and its various use cases today is fragmented across dozens of badly worded, unsearchable posts

Converting a Pandas GroupBy output from Series to DataFrame

阅读更多关于 Converting a Pandas GroupBy output from Series to DataFrame

问题 I\'m starting with input data like this df1 = pandas.DataFrame( { \"Name\" : [\"Alice\", \"Bob\", \"Mallory\", \"Mallory\", \"Bob\" , \"Mallory\"] , \"City\" : [\"Seattle\", \"Seattle\", \"Portland\", \"Seattle\", \"Seattle\", \"Portland\"] } ) Which when printed appears as this: City Name 0 Seattle Alice 1 Seattle Bob 2 Portland Mallory 3 Seattle Mallory 4 Seattle Bob 5 Portland Mallory Grouping is simple enough: g1 = df1.groupby( [ \"Name\", \"City\"] ).count() and printing yields a GroupBy

How do I create a new column from the output of pandas groupby().sum()?

阅读更多关于 How do I create a new column from the output of pandas groupby().sum()?

问题 Trying to create a new column from the groupby calculation. In the code below, I get the correct calculated values for each date (see group below) but when I try to create a new column ( df[\'Data4\'] ) with it I get NaN. So I am trying to create a new column in the dataframe with the sum of Data3 for the all dates and apply that to each date row. For example, 2015-05-08 is in 2 rows (total is 50+5 = 55) and in this new column I would like to have 55 in both of the rows. import pandas as pd

Pandas GroupBy.apply method duplicates first group

阅读更多关于 Pandas GroupBy.apply method duplicates first group

问题 My first SO question: I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example: >>> from pandas import Series, DataFrame >>> import pandas as pd >>> df = pd.DataFrame({\'class\': [\'A\', \'B\', \'C\'], \'count\':[1,0,2]}) >>> print(df) class count 0 A 1 1 B 0 2 C 2 I first check that the groupby function works ok, and it seems to be fine: >>> for group in df.groupby(\'class\', group

GroupBy pandas DataFrame and select most common value

阅读更多关于 GroupBy pandas DataFrame and select most common value

问题 I have a data frame with three string columns. I know that the only one value in the 3rd column is valid for every combination of the first two. To clean the data I have to group by data frame by first two columns and select most common value of the third column for each combination. My code: import pandas as pd from scipy import stats source = pd.DataFrame({\'Country\' : [\'USA\', \'USA\', \'Russia\',\'USA\'], \'City\' : [\'New-York\', \'New-York\', \'Sankt-Petersburg\', \'New-York\'], \

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

阅读更多关于 Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

问题 I have a data frame df and I use several columns from it to groupby : df[\'col1\',\'col2\',\'col3\',\'col4\'].groupby([\'col1\',\'col2\']).mean() In the above way I almost get the table (data frame) that I need. What is missing is an additional column that contains number of rows in each group. In other words, I have mean but I also would like to know how many number were used to get these means. For example in the first group there are 8 values and in the second one 10 and so on. In short:

Get the Row(s) which have the max value in groups using groupby

阅读更多关于 Get the Row(s) which have the max value in groups using groupby

问题 How do I find all rows in a pandas dataframe which have the max value for count column, after grouping by [\'Sp\',\'Mt\'] columns? Example 1: the following dataFrame, which I group by [\'Sp\',\'Mt\'] : Sp Mt Value count 0 MM1 S1 a **3** 1 MM1 S1 n 2 2 MM1 S3 cb 5 3 MM2 S3 mk **8** 4 MM2 S4 bg **10** 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 2 8 MM4 S2 uyi **7** Expected output: get the result rows whose count is max between the groups, like: 0 MM1 S1 a **3** 1 3 MM2 S3 mk **8** 4 MM2 S4 bg *

grouping rows in list in pandas groupby

阅读更多关于 grouping rows in list in pandas groupby

问题 I have a pandas data frame like: a b A 1 A 2 B 5 B 5 B 4 C 6 I want to group by the first column and get second column as lists in rows: A [1,2] B [5,5,4] C [6] Is it possible to do something like this using pandas groupby? 回答1: You can do this using groupby to group on the column of interest and then apply list to every group: In [1]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]}) df Out[1]: a b 0 A 1 1 A 2 2 B 5 3 B 5 4 B 4 5 C 6 In [2]: df.groupby('a')['b'].apply

How to pivot a dataframe

阅读更多关于 How to pivot a dataframe

问题 What is pivot? How do I pivot? Is this a pivot? Long format to wide format? I\'ve seen a lot of questions that ask about pivot tables. Even if they don\'t know that they are asking about pivot tables, they usually are. It is virtually impossible to write a canonical question and answer that encompasses all aspects of pivoting.... ... But I\'m going to give it a go. The problem with existing questions and answers is that often the question is focused on a nuance that the OP has trouble