pandas-groupby | 易学教程

Pandas DataFrame groupby based on condition

阅读更多关于 Pandas DataFrame groupby based on condition

问题 The most similar question I found was here but with no proper answer. Basically I have an issue where I'm trying to use groupby on a dataframe to generate unique IDs for bus routes. The problem is, the data I have at my disposal sometimes (though rarely) has the same values for my groupby columns, so they're considered the same bus even though they aren't. The only other way I can think of is to group buses based on another column called "Type of stop", where there is an indicator for Start,

Pandas DataFrame groupby based on condition

阅读更多关于 Pandas DataFrame groupby based on condition

Pandas datetime week not as expected

阅读更多关于 Pandas datetime week not as expected

问题 When working with Pandas datetimes, I'm trying to group data by the week and year. However, I have noticed some years where the last day of the year ends up grouped with the first week of the same year. import pandas as pd day_df = pd.DataFrame(index=pd.date_range('2016-01-01', '2020-12-31')) for (week, year), subset in day_df.groupby([day_df.index.week, day_df.index.year]): if week == 1: print('Week:', subset.index.min(), subset.index.max()) Week: 1 2016-01-04 00:00:00 2016-01-10 00:00:00

Including the group name in the apply function pandas python

阅读更多关于 Including the group name in the apply function pandas python

问题 Is there away to specify to the groupby() call to use the group name in the apply() lambda function? Similar to if I iterate through groups I can get the group key via the following tuple decomposition: for group_name, subdf in temp_dataframe.groupby(level=0, axis=0): print group_name ...is there a way to also get the group name in the apply function, such as: temp_dataframe.groupby(level=0,axis=0).apply(lambda group_name, subdf: foo(group_name, subdf) How can I get the group name as an

Including the group name in the apply function pandas python

阅读更多关于 Including the group name in the apply function pandas python

Python Dataframe: Calculating R^2 and RMSE Using Groupby on One Column

阅读更多关于 Python Dataframe: Calculating R^2 and RMSE Using Groupby on One Column

问题 I have the following Python dataframe: Type Actual Predicted A 4 3 A 10 18 A 13 11 B 3 10 B 4 2 B 8 33 C 20 17 C 40 33 C 87 80 C 32 30 I have the code to calculate R^2 and RMSE but I don't know how to calculate it by distinct "Type". For now, my methodology is breaking the larger table into three smaller tables consisting of only A, B, C values and then calculating R^2 and RMSE off each smaller table...then appending them back together. But the above method is inefficient and I believe there

Python Dataframe: Calculating R^2 and RMSE Using Groupby on One Column

阅读更多关于 Python Dataframe: Calculating R^2 and RMSE Using Groupby on One Column

Applying a custom groupby aggregate function to find average of Numpy Array

阅读更多关于 Applying a custom groupby aggregate function to find average of Numpy Array

问题 I am having a pandas DataFrame where B contains NumPy list of fixed size. |------|---------------|-------| | A | B | C | |------|---------------|-------| | 0 | [2,3,5,6] | X | |------|---------------|-------| | 1 | [1,2,3,4] | X | |------|---------------|-------| | 2 | [2,3,6,5] | Y | |------|---------------|-------| | 3 | [2,3,2,3] | Y | |------|---------------|-------| | 4 | [2,3,4,4] | Y | |------|---------------|-------| | 5 | [2,3,5,6] | Z | |------|---------------|-------| I want to

Groupby of multiple columns and assigning values to each by considering start and end of each (Pandas)

阅读更多关于 Groupby of multiple columns and assigning values to each by considering start and end of each (Pandas)

问题 I've got a datframe that looks like that df1 v w x y 4 0 1 a b 5 0 1 a a _________________ 6 0 2 a b _________________ 2 0 3 a b - - - - - - - - - 3 1 2 a b _________________ 15 1 3 a b 12 1 3 b b _________________ 13 1 1 a b - - - - - - - - - 15 3 1 a b 14 3 1 b a 8 3 1 a b 9 3 1 a a so df1 were grouped (lines) by v and w and merged with another df which contained x and y. I need a new column z which picks the right group out of x and y with the following conditions: in Every subgroup 'V'

create a new column based on groupby date time column at date level in pandas

阅读更多关于 create a new column based on groupby date time column at date level in pandas

问题 I have data frame as shown below. Doctor Appointment Booking_ID A 2020-01-18 12:00:00 1 A 2020-01-18 12:30:00 2 A 2020-01-18 13:00:00 3 A 2020-01-18 13:00:00 4 A 2020-01-19 13:00:00 13 A 2020-01-19 13:30:00 14 B 2020-01-18 12:00:00 5 B 2020-01-18 12:30:00 6 B 2020-01-18 13:00:00 7 B 2020-01-25 12:30:00 6 B 2020-01-25 13:00:00 7 C 2020-01-19 12:00:00 19 C 2020-01-19 12:30:00 20 C 2020-01-19 13:00:00 21 C 2020-01-22 12:30:00 20 C 2020-01-22 13:00:00 21 From the above I would like to create a