pandas-groupby

Group index by minute and compute average

流过昼夜 提交于 2019-12-11 05:56:41
问题 So I have a pandas dataframe called 'df' and I want to remove the seconds and just have the index in YYYY-MM-DD HH:MM format. But also the minutes are then grouped and the average for that minute is displayed. So I want to turn this dataFrame value 2015-05-03 00:00:00 61.0 2015-05-03 00:00:10 60.0 2015-05-03 00:00:25 60.0 2015-05-03 00:00:30 61.0 2015-05-03 00:00:45 61.0 2015-05-03 00:01:00 61.0 2015-05-03 00:01:10 60.0 2015-05-03 00:01:25 60.0 2015-05-03 00:01:30 61.0 2015-05-03 00:01:45 61

Df groupby set comparison

吃可爱长大的小学妹 提交于 2019-12-11 05:35:00
问题 I have a list of words that I want to test for anagrams. I want to use pandas so I don't have to use computationally wasteful for loops. Given a .txt list of words say: "acb" "bca" "foo" "oof" "spaniel" I want to put them in a df then group them by lists of their anagrams - I can remove duplicate rows later. So far I have the code: import pandas as pd wordlist = pd.read_csv('data/example.txt', sep='\r', header=None, index_col=None, names=['word']) wordlist = wordlist.drop_duplicates(keep=

How do I conditionally aggregate values in projection part of pandas query?

这一生的挚爱 提交于 2019-12-11 05:23:42
问题 I currently have a csv file with this content: ID PRODUCT_ID NAME STOCK SELL_COUNT DELIVERED_BY 1 P1 PRODUCT_P1 12 15 UPS 2 P2 PRODUCT_P2 4 3 DHL 3 P3 PRODUCT_P3 120 22 DHL 4 P1 PRODUCT_P1 423 18 UPS 5 P2 PRODUCT_P2 0 5 GLS 6 P3 PRODUCT_P3 53 10 DHL 7 P4 PRODUCT_P4 22 0 UPS 8 P1 PRODUCT_P1 94 56 GLS 9 P1 PRODUCT_P1 9 24 GLS When I execute this SQL query: SELECT PRODUCT_ID, MIN(CASE WHEN DELIVERED_BY = 'UPS' THEN STOCK END) as STOCK, SUM(CASE WHEN ID > 6 THEN SELL_COUNT END) as TOTAL_SELL

Pandas 0.25.0: groupby on categoricals

て烟熏妆下的殇ゞ 提交于 2019-12-11 05:22:06
问题 I have some difficulties on using Pandas 0.25.0, which is released last month. Consider this date frame: df = pd.DataFrame({ 'A': pd.Series(['a', 'b', 'b', 'a'], dtype='category'), 'B': pd.Series(['m', 'o', 'o', 'o']), 'C': pd.Series([1, 2, 3, 4]), }) Say we want to groupby on the first two columns. The resulting data frame should contain 3 rows, since the combination b m doesn't exist. df.groupby(['A', 'B']).agg({'C': 'sum'}) In Pandas 0.24.1 and earlier, this works fine: C A B a m 1 o 4 b o

Why is group_by -> filter -> summarise faster in R than pandas?

五迷三道 提交于 2019-12-11 05:13:21
问题 I am converting some of our older codes from R to python. In the process, have found pandas to be a bit slower than R. Interested in knowing if there is anything wrong I am doing. R Code (Taking around 2ms on my system): df = data.frame(col_a = sample(letters[1:3],20,T), col_b = sample(1:2,20,T), col_c = sample(letters[1:2],20,T), col_d = sample(c(4,2),20,T) ) microbenchmark::microbenchmark( a = df %>% group_by(col_a, col_b) %>% summarise( a = sum(col_c == 'a'), b = sum(col_c == 'b'), c = a/b

Grouper and axis must be same length in Python

社会主义新天地 提交于 2019-12-11 05:09:16
问题 I am a beginner of Python, and I study a textbook to learn the Pandas module. I have a dataframe called Berri_bike, and it is from the following code: bike_df=pd.read_csv(os.path.join(path,'comptagevelo2012.csv'),parse_dates=['Date'],\ encoding='latin1',dayfirst=True,index_col='Date') Berri_bike=bike_df['Berri1'].copy() # get only the column='Berri1' Berri_bike['Weekday']=Berri_bike.index.weekday weekday_counts = Berri_bike.groupby('Weekday').aggregate(sum) weekday_counts I have 3 columns in

Pandas Transform Position/Rank in Group

拜拜、爱过 提交于 2019-12-11 04:27:23
问题 I have the following DataFrame with two groups of animals and how much food they eat each day, df = pd.DataFrame({'animals': ['cat', 'cat', 'dog', 'dog', 'rat', 'cat', 'rat', 'rat', 'dog', 'cat'], 'food': [1, 2, 2, 5, 3, 1, 4, 0, 6, 5]}, index=pd.MultiIndex.from_product([['group1'] + ['group2'], list(range(5))]) ).rename_axis(['groups', 'day']) df animals food groups day group1 0 cat 1 1 cat 2 2 dog 2 3 dog 5 4 rat 3 group2 0 cat 1 1 rat 4 2 rat 0 3 dog 6 4 cat 5 I can "map"/transform this

assign unique ID to each unique value in group after pandas groupby

别说谁变了你拦得住时间么 提交于 2019-12-11 04:27:01
问题 I see the solution in R but not in python. If the question is duplicate, please point me to the previous asked question/solution. I have a dataframe as following. df = pd.DataFrame({'col1': ['a','b','c','c','d','e','a','h','i','a'],'col2':['3:00','3:00','4:00','4:00','3:00','5:00','5:00','3:00','3:00','2:00']}) df Out[83]: col1 col2 0 a 3:00 1 b 3:00 2 c 4:00 3 c 4:00 4 d 3:00 5 e 5:00 6 a 5:00 7 h 3:00 8 i 3:00 9 a 2:00 What I'd like to do is groupby 'col1' and assign a unique ID to

Pandas `agg` to list, “AttributeError / ValueError: Function does not reduce”

拟墨画扇 提交于 2019-12-11 04:05:14
问题 Often when we perform groupby operations using pandas we may wish to apply several functions across multiple series. groupby.agg seems the natural way to perform these groupings and calculations. However, there seems to be discrepancy between how groupby.agg and groupby.apply are implemented, because I cannot group to a list using agg . Tuple and set works fine, which suggests to me you can only aggregate to immutable types via agg . Via groupby.apply , I can aggregate one series to a list

Pandas: filling missing values iterating through a groupby object

为君一笑 提交于 2019-12-11 02:42:02
问题 I have the folowing dataset: d = {'player': ['1', '1', '1', '1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '2', '3', '3', '3', '3', '3'], 'session': ['a', 'a', 'b', np.nan, 'b', 'c', 'c', 'c', 'c', 'd', 'd', 'e', 'e', np.nan, 'e', 'f', 'f', 'g', np.nan, 'g'], 'date': ['2018-01-01 00:19:05', '2018-01-01 00:21:07', '2018-01-01 00:22:07', '2018-01-01 00:22:15','2018-01-01 00:25:09', '2018-01-01 00:25:11', '2018-01-01 00:27:28', '2018-01-01 00:29:29', '2018-01-01 00:30:35', '2018-01-01 00