pandas-groupby

Weird behaviour with groupby on ordered categorical columns

倖福魔咒の 提交于 2020-01-14 07:04:16
问题 MCVE df = pd.DataFrame({ 'Cat': ['SF', 'W', 'F', 'R64', 'SF', 'F'], 'ID': [1, 1, 1, 2, 2, 2] }) df.Cat = pd.Categorical( df.Cat, categories=['R64', 'SF', 'F', 'W'], ordered=True) As you can see, I've define an ordered categorical column on Cat . To verify, check; 0 SF 1 W 2 F 3 R64 4 SF 5 F Name: Cat, dtype: category Categories (4, object): [R64 < SF < F < W] I want to find the largest category PER ID. Doing groupby + max works. df.groupby('ID').Cat.max() ID 1 W 2 F Name: Cat, dtype: object

After merging I miss pivot table columns in pandas

守給你的承諾、 提交于 2020-01-14 07:00:32
问题 I try to create an output like this; I face difficulties to get my Total column. To get it I use merge as follows; Total = count*price column Here is my code; def generate_invoice_summary_info(): file_path = 'output.xlsx' df = pd.read_excel(file_path, sheet_name='Invoice Details', usecols="E:F,I,L:M") df['Price'] = df['Price'].astype(float) df1 = df.groupby(["Invoice Cost Centre", "Invoice Category"]).agg({'Price': 'sum'}).reset_index() df = pd.pivot_table(df, index=["Invoice Cost Centre",

Get row value of maximum count after applying group by in pandas

99封情书 提交于 2020-01-14 04:52:27
问题 I have the following df >In [260]: df >Out[260]: size market vegetable confirm availability 0 Large ABC Tomato NaN 1 Large XYZ Tomato NaN 2 Small ABC Tomato NaN 3 Large ABC Onion NaN 4 Small ABC Onion NaN 5 Small XYZ Onion NaN 6 Small XYZ Onion NaN 7 Small XYZ Cabbage NaN 8 Large XYZ Cabbage NaN 9 Small ABC Cabbage NaN 1) How to get the size of a vegetable whose size count is maximum? I used groupby on vegetable and size to get the following df But I need to get the rows which contain the

Check if all elements in a group are equal using pandas GroupBy

六月ゝ 毕业季﹏ 提交于 2020-01-13 02:13:30
问题 Is there a pythonic way to group by a field and check if all elements of each resulting group have the same value? Sample data: datetime rating signal 0 2018-12-27 11:33:00 IG 0 1 2018-12-27 11:33:00 HY -1 2 2018-12-27 11:49:00 IG 0 3 2018-12-27 11:49:00 HY -1 4 2018-12-27 12:00:00 IG 0 5 2018-12-27 12:00:00 HY -1 6 2018-12-27 12:49:00 IG 0 7 2018-12-27 12:49:00 HY -1 8 2018-12-27 14:56:00 IG 0 9 2018-12-27 14:56:00 HY -1 10 2018-12-27 15:12:00 IG 0 11 2018-12-27 15:12:00 HY -1 12 2018-12-20

Check if all elements in a group are equal using pandas GroupBy

青春壹個敷衍的年華 提交于 2020-01-13 02:13:13
问题 Is there a pythonic way to group by a field and check if all elements of each resulting group have the same value? Sample data: datetime rating signal 0 2018-12-27 11:33:00 IG 0 1 2018-12-27 11:33:00 HY -1 2 2018-12-27 11:49:00 IG 0 3 2018-12-27 11:49:00 HY -1 4 2018-12-27 12:00:00 IG 0 5 2018-12-27 12:00:00 HY -1 6 2018-12-27 12:49:00 IG 0 7 2018-12-27 12:49:00 HY -1 8 2018-12-27 14:56:00 IG 0 9 2018-12-27 14:56:00 HY -1 10 2018-12-27 15:12:00 IG 0 11 2018-12-27 15:12:00 HY -1 12 2018-12-20

Time difference within group by objects in Python Pandas

半城伤御伤魂 提交于 2020-01-12 12:14:38
问题 I have a dataframe that looks like this: from to datetime other ------------------------------------------------- 11 1 2016-11-06 22:00:00 - 11 1 2016-11-06 20:00:00 - 11 1 2016-11-06 15:45:00 - 11 12 2016-11-06 15:00:00 - 11 1 2016-11-06 12:00:00 - 11 18 2016-11-05 10:00:00 - 11 12 2016-11-05 10:00:00 - 12 1 2016-10-05 10:00:59 - 12 3 2016-09-06 10:00:34 - I want to groupby "from" and then "to" columns and then sort the "datetime" in descending order and then finally want to calculate the

Python Pandas. Date object split by separate columns.

亡梦爱人 提交于 2020-01-11 14:42:11
问题 I have dates in Python (pandas) written as "1/31/2010". To apply linear regression I want to have 3 separate variables: number of day, number of month, number of year. What will be the way to split a column with date in pandas into 3 columns? Another question is to have the same but group days into 3 groups: 1-10, 11-20, 21-31. 回答1: df['date'] = pd.to_datetime(df['date']) #Create 3 additional columns df['day'] = df['date'].dt.day df['month'] = df['date'].dt.month df['year'] = df['date'].dt

Pandas Group by sum of all the values of the group and another column as comma separated

点点圈 提交于 2020-01-11 12:02:27
问题 I want to group by one column (tag) and sum up the corresponding quantites (qty). The related reference no. column should be separated by commas import pandas as pd tag = ['PO_001045M100960','PO_001045M100960','PO_001045MSP2526','PO_001045M870191', 'PO_001045M870191', 'PO_001045M870191'] reference= ['PA_000003', 'PA_000005', 'PA_000001', 'PA_000002', 'PA_000004', 'PA_000009'] qty=[4,2,2,1,1,1] df = pd.DataFrame({'tag' : tag, 'reference':reference, 'qty':qty}) tag reference qty PO

Reorder data frame from calendar year to water year using Python

本秂侑毒 提交于 2020-01-06 07:23:26
问题 This question has been solved with R, but I haven't seen useful examples with Python. I would like to learn how to convert calendar year (1/1/1990 to 12/31/2010) discharge data to water year data (i.e. 10/01/1990 to 9/31/2010). Thank you for the assistance. 回答1: You could use apply and write your own function to create a new column WY : IF you have have df : Date Discharge 0 2011-10-01 00:00:00 0.0 1 2011-10-01 01:00:00 0.0 2 2011-10-01 02:00:00 0.0 3 2011-10-01 03:00:00 0.0 4 2011-10-01 04

Find occurrence of a 'string' in a subgroup column and mark maingroup based on its occurrence

☆樱花仙子☆ 提交于 2020-01-06 07:10:45
问题 I have data which looks like this: Group string A Hello A SearchListing A GoSearch A pen A Hello B Real-Estate B Access B Denied B Group B Group C Glance C NoSearch C Home and so on I want to find out all those group who have "search" phrase in the strings and mark them as 0/1. At the same time I want to aggregate results like unique strings and total strings with respect to each group and also, how many times "search" was encountered by that group. The end results which I want is something