pandas-groupby

Pandas: Get top 10 values AFTER grouping

流过昼夜 提交于 2019-12-23 02:36:53
问题 I have a pandas data frame with a column 'id' and a column 'value'. It is already sorted by first id (ascending) and then value (descending). What I need is the top 10 values per id. I assumed that something like the following would work, but it doesn't: df.groupby("id", as_index=False).aggregate(lambda (index,rows) : rows.iloc[:10]) What I get is just a list of ids, the value column (and other columns that I omitted for the question) aren't there anymore. Any ideas how it might be done,

Why is pandas.grouby.mean so much faster than paralleled implementation

亡梦爱人 提交于 2019-12-22 18:45:13
问题 I was using the pandas grouby mean function like the following on a very large dataset: import pandas as pd df=pd.read_csv("large_dataset.csv") df.groupby(['variable']).mean() It looks like the function is not using multi-processing, and therefore, I implemented a paralleled version: import pandas as pd from multiprocessing import Pool, cpu_count def meanFunc(tmp_name, df_input): df_res=df_input.mean().to_frame().transpose() return df_res def applyParallel(dfGrouped, func): num_process=int

Creating a pivot table in pandas and grouping at the same time the dates per week

北慕城南 提交于 2019-12-21 20:12:27
问题 I want to create a pd.pivot_table in python where one column is a datetime object, but I want also, to group my results on a weekly basis. Here's a simple example: I have the following DataFrame : import pandas as pd names = ['a', 'b', 'c', 'd'] * 7 dates = ['2017-01-11', '2017-01-08', '2017-01-14', '2017-01-05', '2017-01-10', '2017-01-13', '2017-01-02', '2017-01-12', '2017-01-10', '2017-01-05', '2017-01-01', '2017-01-04', '2017-01-11', '2017-01-14', '2017-01-05', '2017-01-06', '2017-01-14',

reset_index() to original column indices after pandas groupby()?

∥☆過路亽.° 提交于 2019-12-21 19:58:18
问题 I generate a grouped dataframe df = df.groupby(['X','Y']).max() which I then want to write (to csv, without indexes). So I need to convert 'X' and 'Y' back to regular columns; I tried using reset_index() , but the order of columns was wrong. How to restore columns 'X' and 'Y' to their exact original column position? Is the solution: df.reset_index(level=0, inplace=True) and then find a way to change the order of the columns? (I also found this approach, for multiindex) 回答1: This solution

Pandas : Use groupby on each element of list

你。 提交于 2019-12-21 17:35:15
问题 Maybe I'm missing the obvious. I have a pandas dataframe that looks like this : id product categories 0 Silmarillion ['Book', 'Fantasy'] 1 Headphones ['Electronic', 'Material'] 2 Dune ['Book', 'Sci-Fi'] I'd like to use the groupby function to count the number of appearances of each element in the categories column, so here the result would be Book 2 Fantasy 1 Electronic 1 Material 1 Sci-Fi 1 However when I try using a groupby function, pandas counts the occurrences of the entire list instead

pandas groupby and rolling_apply ignoring NaNs

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-21 09:31:55
问题 I have a pandas dataframe and I want to calculate the rolling mean of a column (after a groupby clause). However, I want to exclude NaNs. For instance, if the groupby returns [2, NaN, 1], the result should be 1.5 while currently it returns NaN. I've tried the following but it doesn't seem to work: df.groupby(by=['var1'])['value'].apply(pd.rolling_apply, 3, lambda x: np.mean([i for i in x if i is not np.nan and i!='NaN'])) If I even try this: df.groupby(by=['var1'])['value'].apply(pd.rolling

Pandas dataframe to dict of dict

懵懂的女人 提交于 2019-12-21 03:45:45
问题 Given the following pandas data frame: ColA ColB ColC 0 a1 t 1 1 a2 t 2 2 a3 d 3 3 a4 d 4 I want to get a dictionary of dictionary. But I managed to create the following only: d = {t : [1, 2], d : [3, 4]} by: d = {k: list(v) for k,v in duplicated.groupby("ColB")["ColC"]} How could I obtain the dict of dict: dd = {t : {a1:1, a2:2}, d : {a3:3, a4:4}} 回答1: You can do this with a groupby + apply step beforehand. dd = df.set_index('ColA').groupby('ColB').apply( lambda x: x.ColC.to_dict() ).to_dict

Python Pandas: Assign Last Value of DataFrame Group to All Entries of That Group

我与影子孤独终老i 提交于 2019-12-21 03:36:25
问题 In Python Pandas, I have a DataFrame. I group this DataFrame by a column and want to assign the last value of a column to all rows of another column. I know that I am able to select the last row of the group by this command: import pandas as pd df = pd.DataFrame({'a': (1,1,2,3,3), 'b':(20,21,30,40,41)}) print(df) print("-") result = df.groupby('a').nth(-1) print(result) Result: a b 0 1 20 1 1 21 2 2 30 3 3 40 4 3 41 - b a 1 21 2 30 3 41 How would it be possible to assign the result of this

pandas: GroupBy .pipe() vs .apply()

删除回忆录丶 提交于 2019-12-20 09:23:30
问题 In the example from the pandas documentation about the new .pipe() method for GroupBy objects, an .apply() method accepting the same lambda would return the same results. In [195]: import numpy as np In [196]: n = 1000 In [197]: df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n), .....: 'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3'], n), .....: 'Revenue': (np.random.random(n)*50+10).round(2), .....: 'Quantity': np.random.randint(1, 10, size=n)}) In

Compare preceding two rows with subsequent two rows of each group till last record

别来无恙 提交于 2019-12-20 07:17:21
问题 I had a question earlier which is deleted and now modified to a less verbose form for you to read easily. I have a dataframe as given below df = pd.DataFrame({'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] , 'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15]}) df[