pandas-groupby | 易学教程

Group index by minute and compute average

阅读更多关于 Group index by minute and compute average

问题 So I have a pandas dataframe called 'df' and I want to remove the seconds and just have the index in YYYY-MM-DD HH:MM format. But also the minutes are then grouped and the average for that minute is displayed. So I want to turn this dataFrame value 2015-05-03 00:00:00 61.0 2015-05-03 00:00:10 60.0 2015-05-03 00:00:25 60.0 2015-05-03 00:00:30 61.0 2015-05-03 00:00:45 61.0 2015-05-03 00:01:00 61.0 2015-05-03 00:01:10 60.0 2015-05-03 00:01:25 60.0 2015-05-03 00:01:30 61.0 2015-05-03 00:01:45 61

Df groupby set comparison

阅读更多关于 Df groupby set comparison

问题 I have a list of words that I want to test for anagrams. I want to use pandas so I don't have to use computationally wasteful for loops. Given a .txt list of words say: "acb" "bca" "foo" "oof" "spaniel" I want to put them in a df then group them by lists of their anagrams - I can remove duplicate rows later. So far I have the code: import pandas as pd wordlist = pd.read_csv('data/example.txt', sep='\r', header=None, index_col=None, names=['word']) wordlist = wordlist.drop_duplicates(keep=

How do I conditionally aggregate values in projection part of pandas query?

阅读更多关于 How do I conditionally aggregate values in projection part of pandas query?

问题 I currently have a csv file with this content: ID PRODUCT_ID NAME STOCK SELL_COUNT DELIVERED_BY 1 P1 PRODUCT_P1 12 15 UPS 2 P2 PRODUCT_P2 4 3 DHL 3 P3 PRODUCT_P3 120 22 DHL 4 P1 PRODUCT_P1 423 18 UPS 5 P2 PRODUCT_P2 0 5 GLS 6 P3 PRODUCT_P3 53 10 DHL 7 P4 PRODUCT_P4 22 0 UPS 8 P1 PRODUCT_P1 94 56 GLS 9 P1 PRODUCT_P1 9 24 GLS When I execute this SQL query: SELECT PRODUCT_ID, MIN(CASE WHEN DELIVERED_BY = 'UPS' THEN STOCK END) as STOCK, SUM(CASE WHEN ID > 6 THEN SELL_COUNT END) as TOTAL_SELL

Pandas 0.25.0: groupby on categoricals

阅读更多关于 Pandas 0.25.0: groupby on categoricals

问题 I have some difficulties on using Pandas 0.25.0, which is released last month. Consider this date frame: df = pd.DataFrame({ 'A': pd.Series(['a', 'b', 'b', 'a'], dtype='category'), 'B': pd.Series(['m', 'o', 'o', 'o']), 'C': pd.Series([1, 2, 3, 4]), }) Say we want to groupby on the first two columns. The resulting data frame should contain 3 rows, since the combination b m doesn't exist. df.groupby(['A', 'B']).agg({'C': 'sum'}) In Pandas 0.24.1 and earlier, this works fine: C A B a m 1 o 4 b o

Why is group_by -> filter -> summarise faster in R than pandas?

阅读更多关于 Why is group_by -> filter -> summarise faster in R than pandas?

问题 I am converting some of our older codes from R to python. In the process, have found pandas to be a bit slower than R. Interested in knowing if there is anything wrong I am doing. R Code (Taking around 2ms on my system): df = data.frame(col_a = sample(letters[1:3],20,T), col_b = sample(1:2,20,T), col_c = sample(letters[1:2],20,T), col_d = sample(c(4,2),20,T) ) microbenchmark::microbenchmark( a = df %>% group_by(col_a, col_b) %>% summarise( a = sum(col_c == 'a'), b = sum(col_c == 'b'), c = a/b

Grouper and axis must be same length in Python

阅读更多关于 Grouper and axis must be same length in Python

问题 I am a beginner of Python, and I study a textbook to learn the Pandas module. I have a dataframe called Berri_bike, and it is from the following code: bike_df=pd.read_csv(os.path.join(path,'comptagevelo2012.csv'),parse_dates=['Date'],\ encoding='latin1',dayfirst=True,index_col='Date') Berri_bike=bike_df['Berri1'].copy() # get only the column='Berri1' Berri_bike['Weekday']=Berri_bike.index.weekday weekday_counts = Berri_bike.groupby('Weekday').aggregate(sum) weekday_counts I have 3 columns in

Pandas Transform Position/Rank in Group

阅读更多关于 Pandas Transform Position/Rank in Group

问题 I have the following DataFrame with two groups of animals and how much food they eat each day, df = pd.DataFrame({'animals': ['cat', 'cat', 'dog', 'dog', 'rat', 'cat', 'rat', 'rat', 'dog', 'cat'], 'food': [1, 2, 2, 5, 3, 1, 4, 0, 6, 5]}, index=pd.MultiIndex.from_product([['group1'] + ['group2'], list(range(5))]) ).rename_axis(['groups', 'day']) df animals food groups day group1 0 cat 1 1 cat 2 2 dog 2 3 dog 5 4 rat 3 group2 0 cat 1 1 rat 4 2 rat 0 3 dog 6 4 cat 5 I can "map"/transform this

assign unique ID to each unique value in group after pandas groupby

阅读更多关于 assign unique ID to each unique value in group after pandas groupby

问题 I see the solution in R but not in python. If the question is duplicate, please point me to the previous asked question/solution. I have a dataframe as following. df = pd.DataFrame({'col1': ['a','b','c','c','d','e','a','h','i','a'],'col2':['3:00','3:00','4:00','4:00','3:00','5:00','5:00','3:00','3:00','2:00']}) df Out[83]: col1 col2 0 a 3:00 1 b 3:00 2 c 4:00 3 c 4:00 4 d 3:00 5 e 5:00 6 a 5:00 7 h 3:00 8 i 3:00 9 a 2:00 What I'd like to do is groupby 'col1' and assign a unique ID to

Pandas `agg` to list, “AttributeError / ValueError: Function does not reduce”

阅读更多关于 Pandas `agg` to list, “AttributeError / ValueError: Function does not reduce”

问题 Often when we perform groupby operations using pandas we may wish to apply several functions across multiple series. groupby.agg seems the natural way to perform these groupings and calculations. However, there seems to be discrepancy between how groupby.agg and groupby.apply are implemented, because I cannot group to a list using agg . Tuple and set works fine, which suggests to me you can only aggregate to immutable types via agg . Via groupby.apply , I can aggregate one series to a list

Pandas: filling missing values iterating through a groupby object

阅读更多关于 Pandas: filling missing values iterating through a groupby object

问题 I have the folowing dataset: d = {'player': ['1', '1', '1', '1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '2', '3', '3', '3', '3', '3'], 'session': ['a', 'a', 'b', np.nan, 'b', 'c', 'c', 'c', 'c', 'd', 'd', 'e', 'e', np.nan, 'e', 'f', 'f', 'g', np.nan, 'g'], 'date': ['2018-01-01 00:19:05', '2018-01-01 00:21:07', '2018-01-01 00:22:07', '2018-01-01 00:22:15','2018-01-01 00:25:09', '2018-01-01 00:25:11', '2018-01-01 00:27:28', '2018-01-01 00:29:29', '2018-01-01 00:30:35', '2018-01-01 00