pandas-groupby | 易学教程

Selecting groups fromed by groupby function

阅读更多关于 Selecting groups fromed by groupby function

问题 My dataframe: df1 group ordercode quantity 0 A 1 B 3 1 C 1 E 2 D 1 I have formed each group by groupby function. I need to extract the data by using group number. My desired ouput. In:get group 0 out: ordercode quantity A 1 B 3 or group ordercode quantity 0 A 1 B 3 any suggestion would be appreciated. 回答1: Use DataFrame.xs, also is possible use parameter drop_level=False : #if need remove original level df1 = df.xs(0) print (df1) quantity ordercode A 1 B 3 #if avoid remove original level df1

Difference between “as_index = False”, and “reset_index()” in pandas groupby

阅读更多关于 Difference between “as_index = False”, and “reset_index()” in pandas groupby

问题 I just wanted to know what is the difference in the function performed by these 2. Data: import pandas as pd df = pd.DataFrame({"ID":["A","B","A","C","A","A","C","B"], "value":[1,2,4,3,6,7,3,4]}) as_index=False : df_group1 = df.groupby("ID").sum().reset_index() reset_index() : df_group2 = df.groupby("ID", as_index=False).sum() Both of them give the exact same output. ID value 0 A 18 1 B 6 2 C 6 Can anyone tell me what is the difference and any example illustrating the same? 回答1: When you use

Keep columns after a groupby in an empty dataframe

阅读更多关于 Keep columns after a groupby in an empty dataframe

问题 The dataframe is an empty df after query.when groupby,raise runtime waring,then get another empty dataframe with no columns.How to keep the columns? df = pd.DataFrame(columns=["PlatformCategory","Platform","ResClassName","Amount"]) print df result: Empty DataFrame Columns: [PlatformCategory, Platform, ResClassName, Amount] Index: [] then groupby: df = df.groupby(["PlatformCategory","Platform","ResClassName"]).sum() df = df.reset_index(drop=False,inplace=True) print df result: sometimes is

Bar graph from dataframe groupby

阅读更多关于 Bar graph from dataframe groupby

问题 import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.read_csv("arrests.csv") df = df.replace(np.nan,0) df = df.groupby(['home_team'])['arrests'].mean() I'm trying to create a bar graph for dataframe. Under home_team are a bunch of team names. Under arrests are a number of arrests at each date. I've basically grouped the data by teams with the average arrests for that team. I'm trying to create a bar graph for this but am not sure how to proceed since one column doesn

New column in pandas - adding series to dataframe by applying a list groupby

阅读更多关于 New column in pandas - adding series to dataframe by applying a list groupby

问题 Give the following df Id other concat 0 A z 1 1 A y 2 2 B x 3 3 B w 4 4 B v 5 5 B u 6 I want the result with new column with grouped values as list Id other concat new 0 A z 1 [1, 2] 1 A y 2 [1, 2] 2 B x 3 [3, 4, 5, 6] 3 B w 4 [3, 4, 5, 6] 4 B v 5 [3, 4, 5, 6] 5 B u 6 [3, 4, 5, 6] This is similar to these questions: grouping rows in list in pandas groupby Replicating GROUP_CONCAT for pandas.DataFrame However, it is apply the grouping you get from df.groupby('Id')['concat'].apply(list) , which

New column in pandas - adding series to dataframe by applying a list groupby

阅读更多关于 New column in pandas - adding series to dataframe by applying a list groupby

pandas groupby rolling uneven time

阅读更多关于 pandas groupby rolling uneven time

问题 I am having some trouble with pandas rolling. Here a simplify version of my dataset: df2 = pd.DataFrame({ 'A' : pd.Categorical(["test","train","test","train",'train','hello']), 'B' : (pd.Timestamp('2013-01-02 00:00:05'), pd.Timestamp('2013-01-02 00:00:10'), pd.Timestamp('2013-01-02 00:00:09'), pd.Timestamp('2013-01-02 00:01:05'), pd.Timestamp('2013-01-02 00:01:25'), pd.Timestamp('2013-01-02 00:02:05')), 'C' : 1.}).sort_values('A').reset_index(drop=True) >>> df2 A B C 0 hello 2013-01-02 00:02

Speeding up rolling sum calculation in pandas groupby

阅读更多关于 Speeding up rolling sum calculation in pandas groupby

问题 I want to compute rolling sums group-wise for a large number of groups and I'm having trouble doing it acceptably quickly. Pandas has build-in methods for rolling and expanding calculations Here's an example: import pandas as pd import numpy as np obs_per_g = 20 g = 10000 obs = g * obs_per_g k = 20 df = pd.DataFrame( data=np.random.normal(size=obs * k).reshape(obs, k), index=pd.MultiIndex.from_product(iterables=[range(g), range(obs_per_g)]), ) To get rolling and expanding sums I can use df

how to find total of only one column in python pandas pivot table?

阅读更多关于 how to find total of only one column in python pandas pivot table?

问题 My data i get from excel like; Invoice Cost centre Invoice Category Price DataFeed Reporting Fequency RIM Retail QLD 22.25 WEB DWM R5M Retail SYD 22.25 BWH M ..... my pivot table is like; df = pd.read_excel(file_path, sheet_name='Invoice Details', usecols="E:F,I,L:M") df['Price'] = df['Price'].astype(float) df1 = df.groupby(["Invoice Cost Centre", "Invoice Category"]).agg({'Price': 'sum'}).reset_index() df = pd.pivot_table(df, index=["Invoice Cost Centre", "Invoice Category"], columns=['Price

Pandas groupby selecting only one value based on 2 groups and converting rest to 0

阅读更多关于 Pandas groupby selecting only one value based on 2 groups and converting rest to 0

问题 I have a pandas data frame which has a datetime index which looks like this: df = Fruit Quantity 01/02/10 Apple 4 01/02/10 Apple 6 01/02/10 Pear 7 01/02/10 Grape 8 01/02/10 Grape 5 02/02/10 Apple 2 02/02/10 Fruit 6 02/02/10 Pear 8 02/02/10 Pear 5 Now for each date and for each fruit I only want one value (preferably the top one) and the rest of the fruit for the date to remain zero. So desired output is as follows: Fruit Quantity 01/02/10 Apple 4 01/02/10 Apple 0 01/02/10 Pear 7 01/02/10