pandas-groupby | 易学教程

Apply rolling function on pandas dataframe with multiple arguments

阅读更多关于 Apply rolling function on pandas dataframe with multiple arguments

问题 I am trying to apply a rolling function, with a 3 year window, on a pandas dataframe. import pandas as pd # Dummy data df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018], 'IB': [2, 5, 8, 10, 7, 5, 10, 14], 'OB': [5, 8, 10, 12, 5, 10, 14, 20], 'Delta': [2, 2, 1, 3, -1, 3, 2, 4]}) # The function to be applied def get_ln_rate(ib, ob, delta): n_years = len(ib) return sum(delta)*np.log(ob[-1]/ib[0]) / (n_years * (ob[-1]

Forward fill column with an index-based limit

阅读更多关于 Forward fill column with an index-based limit

问题 I want to forward fill a column and I want to specify a limit, but I want the limit to be based on the index---not a simple number of rows like limit allows. For example, say I have the dataframe given by: df = pd.DataFrame({ 'data': [0.0, 1.0, np.nan, 3.0, np.nan, 5.0, np.nan, np.nan, np.nan, np.nan], 'group': [0, 0, 0, 1, 1, 0, 0, 0, 1, 1] }) which looks like In [27]: df Out[27]: data group 0 0.0 0 1 1.0 0 2 NaN 0 3 3.0 1 4 NaN 1 5 5.0 0 6 NaN 0 7 NaN 0 8 NaN 1 9 NaN 1 If I group by the

Pandas: how to get a particular group after groupby? [duplicate]

阅读更多关于 Pandas: how to get a particular group after groupby? [duplicate]

问题 This question already has answers here : How to access pandas groupby dataframe by key (5 answers) Closed 6 years ago . I want to group a dataframe by a column, called 'A', and inspect a particular group. grouped = df.groupby('A', sort=False) However, I don't know how to access a group, for example, I expect that grouped.first() would give me the first group Or grouped['foo'] would give me the group where A=='foo' . However, Pandas doesn't work like that. I couldn't find a similar example

How to keep original index of a DataFrame after groupby 2 columns?

阅读更多关于 How to keep original index of a DataFrame after groupby 2 columns?

问题 Is there any way I can retain the original index of my large dataframe after I perform a groupby? The reason I need to this is because I need to do an inner merge back to my original df (after my groupby) to regain those lost columns. And the index value is the only 'unique' column to perform the merge back into. Does anyone know how I can achieve this? My DataFrame is quite large. My groupby looks like this: df.groupby(['col1', 'col2']).agg({'col3': 'count'}).reset_index() This drops my

How to keep original index of a DataFrame after groupby 2 columns?

阅读更多关于 How to keep original index of a DataFrame after groupby 2 columns?

How to filter dataframe by splitting categories of a columns into sets?

阅读更多关于 How to filter dataframe by splitting categories of a columns into sets?

问题 I have a dataframe: Prop_ID Unit_ID Prop_Usage Unit_Usage 1 1 RESIDENTIAL RESIDENTIAL 1 2 RESIDENTIAL COMMERCIAL 1 3 RESIDENTIAL INDUSTRIAL 1 4 RESIDENTIAL RESIDENTIAL 2 1 COMMERCIAL RESIDENTIAL 2 2 COMMERCIAL COMMERCIAL 2 3 COMMERCIAL COMMERCIAL 3 1 INDUSTRIAL INDUSTRIAL 3 2 INDUSTRIAL COMMERCIAL 4 1 RESIDENTIAL - COMMERCIAL RESIDENTIAL 4 2 RESIDENTIAL - COMMERCIAL COMMERCIAL 4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL 5 1 COMMERCIAL / RESIDENTIAL RESIDENTIAL 5 2 COMMERCIAL / RESIDENTIAL

How to filter dataframe by splitting categories of a columns into sets?

阅读更多关于 How to filter dataframe by splitting categories of a columns into sets?

Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?

阅读更多关于 Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?

问题 I have a database as partially shown below. For each date, there are entries for duration (1-20 per date), with items (100s) listed for each duration. Each item has several associated data points in adjacent columns, including an identifier. For each date, I want to select the largest duration. Then, I want to find the item with a value closest to a given input value. I would like to then obtain the ID for that item to be able to follow the value of this item through its time in the database.

How to rank rows by id in Pandas Python

阅读更多关于 How to rank rows by id in Pandas Python

问题 I have a Dataframe like this: id points1 points2 1 44 53 1 76 34 1 63 66 2 23 34 2 44 56 I want output like this: id points1 points2 points1_rank points2_rank 1 44 53 3 2 1 76 34 1 3 1 63 66 2 1 2 23 79 2 1 2 44 56 1 2 Basically, I want to groupby('id') , and find the rank of each column with same id. I tried this: features = ["points1","points2"] df = pd.merge(df, df.groupby('id')[features].rank().reset_index(), suffixes=["", "_rank"], how='left', on=['id']) But I get keyerror 'id' 回答1: You

Transform pandas groupby result with subtotals to relative values

阅读更多关于 Transform pandas groupby result with subtotals to relative values

问题 I have come accross a nice solution to insert subtotals into a pandas groupby dataframe. However, now I would like to modify the result to show relative values with respect to the subtotals, instead of the absolute values. This is the code to show the groupby: import pandas as pd import numpy as np df = pd.DataFrame( { "Category": np.random.choice(["Group A", "Group B"], 50), "Product": np.random.choice(["Product 1", "Product 2"], 50), "Units_Sold": np.random.randint(1, 100, size=(50)), "Date