pandas-groupby | 易学教程

Pandas: Get top 10 values AFTER grouping

阅读更多关于 Pandas: Get top 10 values AFTER grouping

问题 I have a pandas data frame with a column 'id' and a column 'value'. It is already sorted by first id (ascending) and then value (descending). What I need is the top 10 values per id. I assumed that something like the following would work, but it doesn't: df.groupby("id", as_index=False).aggregate(lambda (index,rows) : rows.iloc[:10]) What I get is just a list of ids, the value column (and other columns that I omitted for the question) aren't there anymore. Any ideas how it might be done,

Why is pandas.grouby.mean so much faster than paralleled implementation

阅读更多关于 Why is pandas.grouby.mean so much faster than paralleled implementation

问题 I was using the pandas grouby mean function like the following on a very large dataset: import pandas as pd df=pd.read_csv("large_dataset.csv") df.groupby(['variable']).mean() It looks like the function is not using multi-processing, and therefore, I implemented a paralleled version: import pandas as pd from multiprocessing import Pool, cpu_count def meanFunc(tmp_name, df_input): df_res=df_input.mean().to_frame().transpose() return df_res def applyParallel(dfGrouped, func): num_process=int

Creating a pivot table in pandas and grouping at the same time the dates per week

阅读更多关于 Creating a pivot table in pandas and grouping at the same time the dates per week

问题 I want to create a pd.pivot_table in python where one column is a datetime object, but I want also, to group my results on a weekly basis. Here's a simple example: I have the following DataFrame : import pandas as pd names = ['a', 'b', 'c', 'd'] * 7 dates = ['2017-01-11', '2017-01-08', '2017-01-14', '2017-01-05', '2017-01-10', '2017-01-13', '2017-01-02', '2017-01-12', '2017-01-10', '2017-01-05', '2017-01-01', '2017-01-04', '2017-01-11', '2017-01-14', '2017-01-05', '2017-01-06', '2017-01-14',

reset_index() to original column indices after pandas groupby()?

阅读更多关于 reset_index() to original column indices after pandas groupby()?

问题 I generate a grouped dataframe df = df.groupby(['X','Y']).max() which I then want to write (to csv, without indexes). So I need to convert 'X' and 'Y' back to regular columns; I tried using reset_index() , but the order of columns was wrong. How to restore columns 'X' and 'Y' to their exact original column position? Is the solution: df.reset_index(level=0, inplace=True) and then find a way to change the order of the columns? (I also found this approach, for multiindex) 回答1: This solution

Pandas : Use groupby on each element of list

阅读更多关于 Pandas : Use groupby on each element of list

问题 Maybe I'm missing the obvious. I have a pandas dataframe that looks like this : id product categories 0 Silmarillion ['Book', 'Fantasy'] 1 Headphones ['Electronic', 'Material'] 2 Dune ['Book', 'Sci-Fi'] I'd like to use the groupby function to count the number of appearances of each element in the categories column, so here the result would be Book 2 Fantasy 1 Electronic 1 Material 1 Sci-Fi 1 However when I try using a groupby function, pandas counts the occurrences of the entire list instead

pandas groupby and rolling_apply ignoring NaNs

阅读更多关于 pandas groupby and rolling_apply ignoring NaNs

问题 I have a pandas dataframe and I want to calculate the rolling mean of a column (after a groupby clause). However, I want to exclude NaNs. For instance, if the groupby returns [2, NaN, 1], the result should be 1.5 while currently it returns NaN. I've tried the following but it doesn't seem to work: df.groupby(by=['var1'])['value'].apply(pd.rolling_apply, 3, lambda x: np.mean([i for i in x if i is not np.nan and i!='NaN'])) If I even try this: df.groupby(by=['var1'])['value'].apply(pd.rolling

Pandas dataframe to dict of dict

阅读更多关于 Pandas dataframe to dict of dict

问题 Given the following pandas data frame: ColA ColB ColC 0 a1 t 1 1 a2 t 2 2 a3 d 3 3 a4 d 4 I want to get a dictionary of dictionary. But I managed to create the following only: d = {t : [1, 2], d : [3, 4]} by: d = {k: list(v) for k,v in duplicated.groupby("ColB")["ColC"]} How could I obtain the dict of dict: dd = {t : {a1:1, a2:2}, d : {a3:3, a4:4}} 回答1: You can do this with a groupby + apply step beforehand. dd = df.set_index('ColA').groupby('ColB').apply( lambda x: x.ColC.to_dict() ).to_dict

Python Pandas: Assign Last Value of DataFrame Group to All Entries of That Group

阅读更多关于 Python Pandas: Assign Last Value of DataFrame Group to All Entries of That Group

问题 In Python Pandas, I have a DataFrame. I group this DataFrame by a column and want to assign the last value of a column to all rows of another column. I know that I am able to select the last row of the group by this command: import pandas as pd df = pd.DataFrame({'a': (1,1,2,3,3), 'b':(20,21,30,40,41)}) print(df) print("-") result = df.groupby('a').nth(-1) print(result) Result: a b 0 1 20 1 1 21 2 2 30 3 3 40 4 3 41 - b a 1 21 2 30 3 41 How would it be possible to assign the result of this

pandas: GroupBy .pipe() vs .apply()

阅读更多关于 pandas: GroupBy .pipe() vs .apply()

问题 In the example from the pandas documentation about the new .pipe() method for GroupBy objects, an .apply() method accepting the same lambda would return the same results. In [195]: import numpy as np In [196]: n = 1000 In [197]: df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n), .....: 'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3'], n), .....: 'Revenue': (np.random.random(n)*50+10).round(2), .....: 'Quantity': np.random.randint(1, 10, size=n)}) In

Compare preceding two rows with subsequent two rows of each group till last record

阅读更多关于 Compare preceding two rows with subsequent two rows of each group till last record

问题 I had a question earlier which is deleted and now modified to a less verbose form for you to read easily. I have a dataframe as given below df = pd.DataFrame({'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] , 'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15]}) df[