pandas-groupby | 易学教程

AttributeError: Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method

阅读更多关于 AttributeError: Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method

I am very new to pandas and trying to use groupby. I have a df with multiple columns. I want to groupby a particular column and then sort each group based on a different column. I am getting the following error AttributeError: Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method . Any help would be much appreciated! Thanks! col1 | col2 | col3 | col4 | col5 ================================= A | A1 | A2 | A3 | DATE1 A | B1 | B2 | B3 | DATE2 I want to groupby col1 and then sort each group by col5 and then do reset_index to get all rows of the

Percentage calculation in pivot table pandas with columns

阅读更多关于 Percentage calculation in pivot table pandas with columns

问题 I have a dataset containing several sells register from different vendors, locations, dates, and products. The data set is like this: local categoria fabricante tipo consistencia peso pacote ordem vendas_kg AREA I SABAO ASATP DILUIDO LIQUIDO 1501 A 2000g PLASTICO 1 10 AREA I SABAO TEPOS DILUIDO LIQUIDO 1501 A 2000g PLASTICO 1 20 AREA I SABAO ASATP CAPSULA LIQUIDO 1501 A 2000g PLASTICO 1 20 AREA I SABAO TEPOS CAPSULA LIQUIDO 1501 A 2000g PLASTICO 1 30 AREA I SABAO ASATP DILUIDO LIQUIDO 1501 A

Pandas groupby and value_counts

阅读更多关于 Pandas groupby and value_counts

问题 I want to count distinct values per column (with pd.value_counts I guess) grouping data by some level in MultiIndex. The multiindex is taken care of with groupby(level= parameter, but apply raises a ValueError Original dataframe: >>> df = pd.DataFrame(np.random.choice(list('ABC'), size=(10,5)), columns=['c1','c2','c3','c4','c5'], index=pd.MultiIndex.from_product([['foo', 'bar'], ['w','y','x','y','z']])) c1 c2 c3 c4 c5 foo w C C B A A y A A C B A x A B C C C y A B C C C z A C B C B bar w B C C

How to use pandas Grouper on multiple keys?

阅读更多关于 How to use pandas Grouper on multiple keys?

I need to groupby-transform a dataframe by a datetime column AND another str(object) column to apply a function by group and asign the result to each of the row members of the group. I understand the groupby workflow but cannot make a pandas.Grouper for both conditions at the same time. Thus: How to use pandas.Grouper on multiple columns? Use the DataFrame.groupby with a list of pandas.Grouper as the by argument like this: df['result'] = df.groupby([ pd.Grouper('dt', freq='D'), pd.Grouper('other_column') ]).transform(foo) 来源： https://stackoverflow.com/questions/52187943/how-to-use-pandas

pandas create boolean column using groupby transform

阅读更多关于 pandas create boolean column using groupby transform

I am trying to create a boolean column using GroupBy.transform on a df like this, id type 1 1.00000 1 1.00000 2 2.00000 2 3.00000 3 2.00000 the code is like, df['has_two'] = df.groupby('id')['type'].transform(lambda x: x == 2) but instead of boolean values, has_two has float values, e.g. 0.0 . I am wondering why is that. UPDATE I created a test case, df = pd.DataFrame({'id':['1', '1', '2', '2', '3'], 'type':[1.0, 1.0, 2.0, 1.0, 2.0]}) df['has_2'] = df.groupby('id')['type'].transform(lambda x: x == 2) this gave me, id type has_2 0 1 1.0 0.0 1 1 1.0 0.0 2 2 2.0 1.0 3 2 1.0 0.0 4 3 2.0 1.0 if I

Rolling grouped cumulative sum

阅读更多关于 Rolling grouped cumulative sum

问题 I'm looking to create a rolling grouped cumulative sum. I can get the result via iteration, but wanted to see if there was a more intelligent way. Here's what the source data looks like: Per C V 1 c 3 1 a 4 1 c 1 2 a 6 2 b 5 3 j 7 4 x 6 4 x 5 4 a 9 5 a 2 6 c 3 6 k 6 Here is the desired result: Per C V 1 c 4 1 a 4 2 c 4 2 a 10 2 b 5 3 c 4 3 a 10 3 b 5 3 j 7 4 c 4 4 a 19 4 b 5 4 j 7 4 x 11 5 c 4 5 a 21 5 b 5 5 j 7 5 x 11 6 c 7 6 a 21 6 b 5 6 j 7 6 x 11 6 k 6 回答1: This is a very interesting

Aggregating string columns using pandas GroupBy

阅读更多关于 Aggregating string columns using pandas GroupBy

I have a DF such as the following: df = vid pos value sente 1 a A 21 2 b B 21 3 b A 21 3 a A 21 1 d B 22 1 a C 22 1 a D 22 2 b A 22 3 a A 22 Now I want to combine all rows with the same value for sente and vid into one row with the values for value joined by an " " df2 = vid pos value sente 1 a A 21 2 b B 21 3 b a A A 21 1 d a a B C D 22 2 b A 22 3 a A 22 I suppose a modification of this should do the trick: df2 = df.groupby["sente"].agg(lambda x: " ".join(x)) But I can't seem to figure out how to add the second column to the statement. Groupers can be passed as lists. Furthermore, you can

Pandas groupby each column and add new column for each group

阅读更多关于 Pandas groupby each column and add new column for each group

I have a data frame like this lvl1=['l1A','l1A','l1B','l1C','l1D'] lvl2=['l2A','l2A','l2A','l26','l27'] wgt=[.2,.3,.15,.05,.3] lvls=[lvl1,lvl2] df=pd.DataFrame(wgt, lvls).reset_index() df.columns = ['lvl' + str(i) for i in range(1,3)] + ['wgt'] df lvl1 lvl2 wgt 0 l1A l2A 0.20 1 l1A l2A 0.30 2 l1B l2A 0.15 3 l1C l26 0.05 4 l1D l27 0.30 I want to get the average weight at each level and add them as a separate column to this data frame. pd.concat([df, df.groupby('lvl1').transform('mean').add_suffix('_l1avg'), df.groupby('lvl2').transform('mean').add_suffix('_l2avg')], axis=1) lvl1 lvl2 wgt wgt

Groupby two columns ignoring order of pairs

阅读更多关于 Groupby two columns ignoring order of pairs

Suppose we have a dataframe that looks like this: start stop duration 0 A B 1 1 B A 2 2 C D 2 3 D C 0 What's the best way to construct a list of: i) start/stop pairs; ii) count of start/stop pairs; iii) avg duration of start/stop pairs? In this case, order should not matter: (A,B)=(B,A) . Desired output: [[start,stop,count,avg duration]] In this example: [[A,B,2,1.5],[C,D,2,1]] sort the first two columns (you can do this in-place, or create a copy and do the same thing; I've done the former), then groupby and agg : df[['start', 'stop']] = np.sort(df[['start', 'stop']], axis=1) (df.groupby([

How to use pd.concat with an un initiated dataframe?

阅读更多关于 How to use pd.concat with an un initiated dataframe?

I want to be able to concat dataframe results to memory as they go through a function and end up with a whole new dataframe with just the results. How do I do this without having a dataframe all ready created before the function? For example: import pandas as pd import numpy as np rand_df = pd.DataFrame({'A': [ 'x','x','y','y','z','z','z'],'B': np.random.randn(7)}) def myFuncOnDF(df, row): df = df.groupby(['A']).get_group(row).describe() myFuncOnDF(rand_df, 'x') myFuncOnDF(rand_df, 'y') myFuncOnDF(rand_df, 'z') How would I concat the results of myFuncOnDF() to a new dataframe that doesn't