pandas-groupby | 易学教程

Weird behaviour with groupby on ordered categorical columns

阅读更多关于 Weird behaviour with groupby on ordered categorical columns

问题 MCVE df = pd.DataFrame({ 'Cat': ['SF', 'W', 'F', 'R64', 'SF', 'F'], 'ID': [1, 1, 1, 2, 2, 2] }) df.Cat = pd.Categorical( df.Cat, categories=['R64', 'SF', 'F', 'W'], ordered=True) As you can see, I've define an ordered categorical column on Cat . To verify, check; 0 SF 1 W 2 F 3 R64 4 SF 5 F Name: Cat, dtype: category Categories (4, object): [R64 < SF < F < W] I want to find the largest category PER ID. Doing groupby + max works. df.groupby('ID').Cat.max() ID 1 W 2 F Name: Cat, dtype: object

After merging I miss pivot table columns in pandas

阅读更多关于 After merging I miss pivot table columns in pandas

问题 I try to create an output like this; I face difficulties to get my Total column. To get it I use merge as follows; Total = count*price column Here is my code; def generate_invoice_summary_info(): file_path = 'output.xlsx' df = pd.read_excel(file_path, sheet_name='Invoice Details', usecols="E:F,I,L:M") df['Price'] = df['Price'].astype(float) df1 = df.groupby(["Invoice Cost Centre", "Invoice Category"]).agg({'Price': 'sum'}).reset_index() df = pd.pivot_table(df, index=["Invoice Cost Centre",

Get row value of maximum count after applying group by in pandas

阅读更多关于 Get row value of maximum count after applying group by in pandas

问题 I have the following df >In [260]: df >Out[260]: size market vegetable confirm availability 0 Large ABC Tomato NaN 1 Large XYZ Tomato NaN 2 Small ABC Tomato NaN 3 Large ABC Onion NaN 4 Small ABC Onion NaN 5 Small XYZ Onion NaN 6 Small XYZ Onion NaN 7 Small XYZ Cabbage NaN 8 Large XYZ Cabbage NaN 9 Small ABC Cabbage NaN 1) How to get the size of a vegetable whose size count is maximum? I used groupby on vegetable and size to get the following df But I need to get the rows which contain the

Check if all elements in a group are equal using pandas GroupBy

阅读更多关于 Check if all elements in a group are equal using pandas GroupBy

问题 Is there a pythonic way to group by a field and check if all elements of each resulting group have the same value? Sample data: datetime rating signal 0 2018-12-27 11:33:00 IG 0 1 2018-12-27 11:33:00 HY -1 2 2018-12-27 11:49:00 IG 0 3 2018-12-27 11:49:00 HY -1 4 2018-12-27 12:00:00 IG 0 5 2018-12-27 12:00:00 HY -1 6 2018-12-27 12:49:00 IG 0 7 2018-12-27 12:49:00 HY -1 8 2018-12-27 14:56:00 IG 0 9 2018-12-27 14:56:00 HY -1 10 2018-12-27 15:12:00 IG 0 11 2018-12-27 15:12:00 HY -1 12 2018-12-20

Check if all elements in a group are equal using pandas GroupBy

阅读更多关于 Check if all elements in a group are equal using pandas GroupBy

Time difference within group by objects in Python Pandas

阅读更多关于 Time difference within group by objects in Python Pandas

问题 I have a dataframe that looks like this: from to datetime other ------------------------------------------------- 11 1 2016-11-06 22:00:00 - 11 1 2016-11-06 20:00:00 - 11 1 2016-11-06 15:45:00 - 11 12 2016-11-06 15:00:00 - 11 1 2016-11-06 12:00:00 - 11 18 2016-11-05 10:00:00 - 11 12 2016-11-05 10:00:00 - 12 1 2016-10-05 10:00:59 - 12 3 2016-09-06 10:00:34 - I want to groupby "from" and then "to" columns and then sort the "datetime" in descending order and then finally want to calculate the

Python Pandas. Date object split by separate columns.

阅读更多关于 Python Pandas. Date object split by separate columns.

问题 I have dates in Python (pandas) written as "1/31/2010". To apply linear regression I want to have 3 separate variables: number of day, number of month, number of year. What will be the way to split a column with date in pandas into 3 columns? Another question is to have the same but group days into 3 groups: 1-10, 11-20, 21-31. 回答1: df['date'] = pd.to_datetime(df['date']) #Create 3 additional columns df['day'] = df['date'].dt.day df['month'] = df['date'].dt.month df['year'] = df['date'].dt

Pandas Group by sum of all the values of the group and another column as comma separated

阅读更多关于 Pandas Group by sum of all the values of the group and another column as comma separated

问题 I want to group by one column (tag) and sum up the corresponding quantites (qty). The related reference no. column should be separated by commas import pandas as pd tag = ['PO_001045M100960','PO_001045M100960','PO_001045MSP2526','PO_001045M870191', 'PO_001045M870191', 'PO_001045M870191'] reference= ['PA_000003', 'PA_000005', 'PA_000001', 'PA_000002', 'PA_000004', 'PA_000009'] qty=[4,2,2,1,1,1] df = pd.DataFrame({'tag' : tag, 'reference':reference, 'qty':qty}) tag reference qty PO

Reorder data frame from calendar year to water year using Python

阅读更多关于 Reorder data frame from calendar year to water year using Python

问题 This question has been solved with R, but I haven't seen useful examples with Python. I would like to learn how to convert calendar year (1/1/1990 to 12/31/2010) discharge data to water year data (i.e. 10/01/1990 to 9/31/2010). Thank you for the assistance. 回答1: You could use apply and write your own function to create a new column WY : IF you have have df : Date Discharge 0 2011-10-01 00:00:00 0.0 1 2011-10-01 01:00:00 0.0 2 2011-10-01 02:00:00 0.0 3 2011-10-01 03:00:00 0.0 4 2011-10-01 04

Find occurrence of a 'string' in a subgroup column and mark maingroup based on its occurrence

阅读更多关于 Find occurrence of a 'string' in a subgroup column and mark maingroup based on its occurrence

问题 I have data which looks like this: Group string A Hello A SearchListing A GoSearch A pen A Hello B Real-Estate B Access B Denied B Group B Group C Glance C NoSearch C Home and so on I want to find out all those group who have "search" phrase in the strings and mark them as 0/1. At the same time I want to aggregate results like unique strings and total strings with respect to each group and also, how many times "search" was encountered by that group. The end results which I want is something