pandas-groupby

Sort Values in DataFrame using Categorical Key without groupby Split Apply Combine

流过昼夜 提交于 2020-01-06 06:03:31
问题 So... I have a Dataframe that looks like this, but much larger: DATE ITEM STORE STOCK 0 2018-06-06 A L001 4 1 2018-06-06 A L002 0 2 2018-06-06 A L003 4 3 2018-06-06 B L001 1 4 2018-06-06 B L002 2 You can reproduce the same DataFrame with the following code: import pandas as pd import numpy as np import itertools as it lojas = ['L001', 'L002', 'L003'] itens = list("ABC") dr = pd.date_range(start='2018-06-06', end='2018-06-12') df = pd.DataFrame(data=list(it.product(dr, itens, lojas)), columns=

Pandas groupby value_count filter by frequency

断了今生、忘了曾经 提交于 2020-01-04 05:49:14
问题 I would like to filter out the frequencies that are less than n, in my case n is 2 df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'bar',],'B' : ['yes', 'no', 'yes', 'no', 'no', 'yes','yes', 'no', 'no', 'no']}) df.groupby('A')['B'].value_counts() A B bar no 4 yes 1 foo yes 3 no 2 Name: B, dtype: int64 Ideally I would like the results in a dataframe showing the below(frequency of 1 is not excluded) A B freq bar no 4 foo yes 3 foo no 2 I have tried df

Pandas DataFrame - Aggregate on column whos dtype=='category' leads to slow performance

风格不统一 提交于 2020-01-03 13:09:37
问题 I work with big dataframes with high memory usage and I read that if I change the dtype on repeated values columns I can save big amount of memory. I tried it and indeed it dropped the memory usage by 25% but then I bumped into a performance slowness which I could not understand. I do group-by aggregation on the dtype 'category' columns and before I changed the dtype it took about 1 second and after the change it took about 1 minute. This code demonstrates the performance degradation by

Custom sort order function for groupby pandas python

大憨熊 提交于 2020-01-03 05:22:29
问题 Let's say I have a grouped dataframe like the below (which was obtained through an initial df.groupby(df["A"]).apply(some_func) where some_func returns a dataframe itself). The second column is the second level of the multiindex which was created by the groupby . A B C 1 0 1 8 1 3 3 2 0 1 2 1 2 2 3 0 1 3 1 2 4 And I would like to order on the result of a custom function that I apply to the groups. Let's assume for this example that the function is def my_func(group): return sum(group["B"]

Pandas Groupby TimeGrouper and apply

我的未来我决定 提交于 2020-01-03 00:52:27
问题 As per this question. This groupby works when applied to my df for a pd.rolling_mean column as follows: data['maFast']=data['Last'].groupby(pd.TimeGrouper('d')) .apply(pd.rolling_mean,center=False,win‌​dow=10) How do I apply the same groupby logic to another element of my df which contains pd.rolling_std and pd.rolling_mean : data['maSlow_std'] = pd.rolling_mean(data['Last'], window=60) + 2* pd.rolling_std(data['Last'], 20, min_periods=20) 回答1: I think you need function lambda : data['maSlow

Groupby two columns ignoring order of pairs

☆樱花仙子☆ 提交于 2020-01-02 08:22:48
问题 Suppose we have a dataframe that looks like this: start stop duration 0 A B 1 1 B A 2 2 C D 2 3 D C 0 What's the best way to construct a list of: i) start/stop pairs; ii) count of start/stop pairs; iii) avg duration of start/stop pairs? In this case, order should not matter: (A,B)=(B,A) . Desired output: [[start,stop,count,avg duration]] In this example: [[A,B,2,1.5],[C,D,2,1]] 回答1: sort the first two columns (you can do this in-place, or create a copy and do the same thing; I've done the

Groupby two columns ignoring order of pairs

折月煮酒 提交于 2020-01-02 08:21:38
问题 Suppose we have a dataframe that looks like this: start stop duration 0 A B 1 1 B A 2 2 C D 2 3 D C 0 What's the best way to construct a list of: i) start/stop pairs; ii) count of start/stop pairs; iii) avg duration of start/stop pairs? In this case, order should not matter: (A,B)=(B,A) . Desired output: [[start,stop,count,avg duration]] In this example: [[A,B,2,1.5],[C,D,2,1]] 回答1: sort the first two columns (you can do this in-place, or create a copy and do the same thing; I've done the

Insert rows as a result of a groupby operation into the original dataframe

孤者浪人 提交于 2020-01-02 07:51:53
问题 For example, I have a pandas dataframe as follows: col_1 col_2 col_3 col_4 a X 5 1 a Y 3 2 a Z 6 4 b X 7 8 b Y 4 3 b Z 6 5 And I want to, for each value in col_1, add the values in col_3 and col_4 (and many more columns) that correspond to X and Z from col_2 and create a new row with these values. So the output would be as below: col_1 col_2 col_3 col_4 a X 5 1 a Y 3 2 a Z 6 4 a NEW 11 5 b X 7 8 b Y 4 3 b Z 6 5 b NEW 13 13 Also, there could be more values in col_1 that will need the same

Pandas enumerate groups in descending order

瘦欲@ 提交于 2020-01-02 07:25:23
问题 I've the following column: column 0 10 1 10 2 8 3 8 4 6 5 6 My goal is to find the today unique values (3 in this case) and create a new column which would create the following new_column 0 3 1 3 2 2 3 2 4 1 5 1 The numbering starts from length of unique values (3) and same number is repeated if current row is same as previous row based on original column. Number gets decreased as row value changes. All unique values in original column have same number of rows (2 rows for each unique value in

Reshape pandas dataframe from rows to columns

亡梦爱人 提交于 2020-01-02 02:22:07
问题 I'm trying to reshape my data. At first glance, it sounds like a transpose, but it's not. I tried melts, stack/unstack, joins, etc. Use Case I want to have only one row per unique individual, and put all job history on the columns. For clients, it can be easier to read information across rows rather than reading through columns. Here's the data: import pandas as pd import numpy as np data1 = {'Name': ["Joe", "Joe", "Joe","Jane","Jane"], 'Job': ["Analyst","Manager","Director","Analyst",