pandas-groupby

Groupby sum and count on multiple columns in python

ぃ、小莉子 提交于 2019-12-04 15:59:18
I have a pandas dataframe that looks like this ID country month revenue profit ebit 234 USA 201409 10 5 3 344 USA 201409 9 7 2 532 UK 201410 20 10 5 129 Canada 201411 15 10 5 I want to group by ID, country, month and count the IDs per month and country and sum the revenue, profit, ebit. The output for the above data would be: country month revenue profit ebit count USA 201409 19 12 5 2 UK 201409 20 10 5 1 Canada 201411 15 10 5 1 I have tried different variations of groupby, sum and count functions of pandas but I am unable to figure out how to apply groupby sum and count all together to give

Pandas groupby and value_counts

我只是一个虾纸丫 提交于 2019-12-04 15:36:08
I want to count distinct values per column (with pd.value_counts I guess) grouping data by some level in MultiIndex. The multiindex is taken care of with groupby(level= parameter, but apply raises a ValueError Original dataframe: >>> df = pd.DataFrame(np.random.choice(list('ABC'), size=(10,5)), columns=['c1','c2','c3','c4','c5'], index=pd.MultiIndex.from_product([['foo', 'bar'], ['w','y','x','y','z']])) c1 c2 c3 c4 c5 foo w C C B A A y A A C B A x A B C C C y A B C C C z A C B C B bar w B C C A C y A A C A A x A B B B A y A A C A B z A B B C B What I want: c1 c2 c3 c4 c5 foo A 4 2 0 3 2 B 1 2

Percentage calculation in pivot table pandas with columns

被刻印的时光 ゝ 提交于 2019-12-04 15:34:22
I have a dataset containing several sells register from different vendors, locations, dates, and products. The data set is like this: local categoria fabricante tipo consistencia peso pacote ordem vendas_kg AREA I SABAO ASATP DILUIDO LIQUIDO 1501 A 2000g PLASTICO 1 10 AREA I SABAO TEPOS DILUIDO LIQUIDO 1501 A 2000g PLASTICO 1 20 AREA I SABAO ASATP CAPSULA LIQUIDO 1501 A 2000g PLASTICO 1 20 AREA I SABAO TEPOS CAPSULA LIQUIDO 1501 A 2000g PLASTICO 1 30 AREA I SABAO ASATP DILUIDO LIQUIDO 1501 A 2000g PLASTICO 2 20 AREA I SABAO TEPOS DILUIDO LIQUIDO 1501 A 2000g PLASTICO 2 30 AREA I SABAO ASATP

Rolling grouped cumulative sum

随声附和 提交于 2019-12-04 09:14:58
I'm looking to create a rolling grouped cumulative sum. I can get the result via iteration, but wanted to see if there was a more intelligent way. Here's what the source data looks like: Per C V 1 c 3 1 a 4 1 c 1 2 a 6 2 b 5 3 j 7 4 x 6 4 x 5 4 a 9 5 a 2 6 c 3 6 k 6 Here is the desired result: Per C V 1 c 4 1 a 4 2 c 4 2 a 10 2 b 5 3 c 4 3 a 10 3 b 5 3 j 7 4 c 4 4 a 19 4 b 5 4 j 7 4 x 11 5 c 4 5 a 21 5 b 5 5 j 7 5 x 11 6 c 7 6 a 21 6 b 5 6 j 7 6 x 11 6 k 6 This is a very interesting problem. Try below to see if it works for you. ( pd.concat([df.loc[df.Per<=i][['C','V']].assign(Per=i) for i in

I applied sum() on a groupby and I want to sort the values of the last column

こ雲淡風輕ζ 提交于 2019-12-04 04:02:34
问题 Given the following DataFrame user_ID product_id amount 1 456 1 1 87 1 1 788 3 1 456 5 1 87 2 ... ... ... The first column is the ID of the customer, the second is the ID of the product he bought and the 'amount' express if the quantity of the product purchased on that given day (the date is also taken into consideration). a customer can buy many products each day as much as he wants to. I want to calculate the total of times each product is bought by the customer, so I applied a groupby df

pandas groupby and rolling_apply ignoring NaNs

♀尐吖头ヾ 提交于 2019-12-04 03:41:05
I have a pandas dataframe and I want to calculate the rolling mean of a column (after a groupby clause). However, I want to exclude NaNs. For instance, if the groupby returns [2, NaN, 1], the result should be 1.5 while currently it returns NaN. I've tried the following but it doesn't seem to work: df.groupby(by=['var1'])['value'].apply(pd.rolling_apply, 3, lambda x: np.mean([i for i in x if i is not np.nan and i!='NaN'])) If I even try this: df.groupby(by=['var1'])['value'].apply(pd.rolling_apply, 3, lambda x: 1) I'm getting NaN in the output so it must be something to do with how pandas works

df.groupby(…).agg(set) produces different result compared to df.groupby(…).agg(lambda x: set(x))

扶醉桌前 提交于 2019-12-03 15:30:17
问题 Answering this question it turned out that df.groupby(...).agg(set) and df.groupby(...).agg(lambda x: set(x)) are producing different results. Data: df = pd.DataFrame({ 'user_id': [1, 2, 3, 4, 1, 2, 3], 'class_type': ['Krav Maga', 'Yoga', 'Ju-jitsu', 'Krav Maga', 'Ju-jitsu','Krav Maga', 'Karate'], 'instructor': ['Bob', 'Alice','Bob', 'Alice','Alice', 'Alice','Bob']}) Demo: In [36]: df.groupby('user_id').agg(lambda x: set(x)) Out[36]: class_type instructor user_id 1 {Krav Maga, Ju-jitsu}

Faster alternative to perform pandas groupby operation

余生颓废 提交于 2019-12-03 13:34:33
问题 I have a dataset with name (person_name), day and color (shirt_color) as columns. Each person wears a shirt with a certain color on a particular day. The number of days can be arbitrary. E.g. input: name day color ---------------- John 1 White John 2 White John 3 Blue John 4 Blue John 5 White Tom 2 White Tom 3 Blue Tom 4 Blue Tom 5 Black Jerry 1 Black Jerry 2 Black Jerry 4 Black Jerry 5 White I need to find the most frequently used color by each person. E.g. result: name color -------------

Groupby, transpose and append in Pandas?

我只是一个虾纸丫 提交于 2019-12-03 12:14:12
I have a dataframe which looks like this: Each user has 10 records. Now, I want to create a dataframe which looks like this: userid name1 name2 ... name10 which means I need to invert every 10 records of the column name and append to a new dataframe. So, how do it do it? Is there any way I can do it in Pandas? groupby('userid') then reset_index within each group to enumerate consistently across groups. Then unstack to get columns. df.groupby('userid')['name'].apply(lambda df: df.reset_index(drop=True)).unstack() Demonstration df = pd.DataFrame([ [123, 'abc'], [123, 'abc'], [456, 'def'], [123,

Pandas dataframe to dict of dict

泪湿孤枕 提交于 2019-12-03 11:03:57
Given the following pandas data frame: ColA ColB ColC 0 a1 t 1 1 a2 t 2 2 a3 d 3 3 a4 d 4 I want to get a dictionary of dictionary. But I managed to create the following only: d = {t : [1, 2], d : [3, 4]} by: d = {k: list(v) for k,v in duplicated.groupby("ColB")["ColC"]} How could I obtain the dict of dict: dd = {t : {a1:1, a2:2}, d : {a3:3, a4:4}} You can do this with a groupby + apply step beforehand. dd = df.set_index('ColA').groupby('ColB').apply( lambda x: x.ColC.to_dict() ).to_dict() Or, with a dict comprehension: dd = {k : g.ColC.to_dict() for k, g in df.set_index('ColA').groupby('ColB'