pandas-groupby

AttributeError: Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method

扶醉桌前 提交于 2019-12-06 11:21:22
I am very new to pandas and trying to use groupby. I have a df with multiple columns. I want to groupby a particular column and then sort each group based on a different column. I am getting the following error AttributeError: Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method . Any help would be much appreciated! Thanks! col1 | col2 | col3 | col4 | col5 ================================= A | A1 | A2 | A3 | DATE1 A | B1 | B2 | B3 | DATE2 I want to groupby col1 and then sort each group by col5 and then do reset_index to get all rows of the

Percentage calculation in pivot table pandas with columns

余生颓废 提交于 2019-12-06 09:50:54
问题 I have a dataset containing several sells register from different vendors, locations, dates, and products. The data set is like this: local categoria fabricante tipo consistencia peso pacote ordem vendas_kg AREA I SABAO ASATP DILUIDO LIQUIDO 1501 A 2000g PLASTICO 1 10 AREA I SABAO TEPOS DILUIDO LIQUIDO 1501 A 2000g PLASTICO 1 20 AREA I SABAO ASATP CAPSULA LIQUIDO 1501 A 2000g PLASTICO 1 20 AREA I SABAO TEPOS CAPSULA LIQUIDO 1501 A 2000g PLASTICO 1 30 AREA I SABAO ASATP DILUIDO LIQUIDO 1501 A

Pandas groupby and value_counts

六月ゝ 毕业季﹏ 提交于 2019-12-06 06:35:46
问题 I want to count distinct values per column (with pd.value_counts I guess) grouping data by some level in MultiIndex. The multiindex is taken care of with groupby(level= parameter, but apply raises a ValueError Original dataframe: >>> df = pd.DataFrame(np.random.choice(list('ABC'), size=(10,5)), columns=['c1','c2','c3','c4','c5'], index=pd.MultiIndex.from_product([['foo', 'bar'], ['w','y','x','y','z']])) c1 c2 c3 c4 c5 foo w C C B A A y A A C B A x A B C C C y A B C C C z A C B C B bar w B C C

How to use pandas Grouper on multiple keys?

时光怂恿深爱的人放手 提交于 2019-12-06 06:13:20
I need to groupby-transform a dataframe by a datetime column AND another str(object) column to apply a function by group and asign the result to each of the row members of the group. I understand the groupby workflow but cannot make a pandas.Grouper for both conditions at the same time. Thus: How to use pandas.Grouper on multiple columns? Use the DataFrame.groupby with a list of pandas.Grouper as the by argument like this: df['result'] = df.groupby([ pd.Grouper('dt', freq='D'), pd.Grouper('other_column') ]).transform(foo) 来源: https://stackoverflow.com/questions/52187943/how-to-use-pandas

pandas create boolean column using groupby transform

◇◆丶佛笑我妖孽 提交于 2019-12-06 06:10:10
I am trying to create a boolean column using GroupBy.transform on a df like this, id type 1 1.00000 1 1.00000 2 2.00000 2 3.00000 3 2.00000 the code is like, df['has_two'] = df.groupby('id')['type'].transform(lambda x: x == 2) but instead of boolean values, has_two has float values, e.g. 0.0 . I am wondering why is that. UPDATE I created a test case, df = pd.DataFrame({'id':['1', '1', '2', '2', '3'], 'type':[1.0, 1.0, 2.0, 1.0, 2.0]}) df['has_2'] = df.groupby('id')['type'].transform(lambda x: x == 2) this gave me, id type has_2 0 1 1.0 0.0 1 1 1.0 0.0 2 2 2.0 1.0 3 2 1.0 0.0 4 3 2.0 1.0 if I

Rolling grouped cumulative sum

。_饼干妹妹 提交于 2019-12-06 03:59:52
问题 I'm looking to create a rolling grouped cumulative sum. I can get the result via iteration, but wanted to see if there was a more intelligent way. Here's what the source data looks like: Per C V 1 c 3 1 a 4 1 c 1 2 a 6 2 b 5 3 j 7 4 x 6 4 x 5 4 a 9 5 a 2 6 c 3 6 k 6 Here is the desired result: Per C V 1 c 4 1 a 4 2 c 4 2 a 10 2 b 5 3 c 4 3 a 10 3 b 5 3 j 7 4 c 4 4 a 19 4 b 5 4 j 7 4 x 11 5 c 4 5 a 21 5 b 5 5 j 7 5 x 11 6 c 7 6 a 21 6 b 5 6 j 7 6 x 11 6 k 6 回答1: This is a very interesting

Aggregating string columns using pandas GroupBy

六眼飞鱼酱① 提交于 2019-12-06 03:48:07
I have a DF such as the following: df = vid pos value sente 1 a A 21 2 b B 21 3 b A 21 3 a A 21 1 d B 22 1 a C 22 1 a D 22 2 b A 22 3 a A 22 Now I want to combine all rows with the same value for sente and vid into one row with the values for value joined by an " " df2 = vid pos value sente 1 a A 21 2 b B 21 3 b a A A 21 1 d a a B C D 22 2 b A 22 3 a A 22 I suppose a modification of this should do the trick: df2 = df.groupby["sente"].agg(lambda x: " ".join(x)) But I can't seem to figure out how to add the second column to the statement. Groupers can be passed as lists. Furthermore, you can

Pandas groupby each column and add new column for each group

拟墨画扇 提交于 2019-12-06 03:38:37
I have a data frame like this lvl1=['l1A','l1A','l1B','l1C','l1D'] lvl2=['l2A','l2A','l2A','l26','l27'] wgt=[.2,.3,.15,.05,.3] lvls=[lvl1,lvl2] df=pd.DataFrame(wgt, lvls).reset_index() df.columns = ['lvl' + str(i) for i in range(1,3)] + ['wgt'] df lvl1 lvl2 wgt 0 l1A l2A 0.20 1 l1A l2A 0.30 2 l1B l2A 0.15 3 l1C l26 0.05 4 l1D l27 0.30 I want to get the average weight at each level and add them as a separate column to this data frame. pd.concat([df, df.groupby('lvl1').transform('mean').add_suffix('_l1avg'), df.groupby('lvl2').transform('mean').add_suffix('_l2avg')], axis=1) lvl1 lvl2 wgt wgt

Groupby two columns ignoring order of pairs

徘徊边缘 提交于 2019-12-05 23:02:07
Suppose we have a dataframe that looks like this: start stop duration 0 A B 1 1 B A 2 2 C D 2 3 D C 0 What's the best way to construct a list of: i) start/stop pairs; ii) count of start/stop pairs; iii) avg duration of start/stop pairs? In this case, order should not matter: (A,B)=(B,A) . Desired output: [[start,stop,count,avg duration]] In this example: [[A,B,2,1.5],[C,D,2,1]] sort the first two columns (you can do this in-place, or create a copy and do the same thing; I've done the former), then groupby and agg : df[['start', 'stop']] = np.sort(df[['start', 'stop']], axis=1) (df.groupby([

How to use pd.concat with an un initiated dataframe?

你。 提交于 2019-12-05 20:24:47
I want to be able to concat dataframe results to memory as they go through a function and end up with a whole new dataframe with just the results. How do I do this without having a dataframe all ready created before the function? For example: import pandas as pd import numpy as np rand_df = pd.DataFrame({'A': [ 'x','x','y','y','z','z','z'],'B': np.random.randn(7)}) def myFuncOnDF(df, row): df = df.groupby(['A']).get_group(row).describe() myFuncOnDF(rand_df, 'x') myFuncOnDF(rand_df, 'y') myFuncOnDF(rand_df, 'z') How would I concat the results of myFuncOnDF() to a new dataframe that doesn't