pandas-groupby

Pandas DataFrame groupby based on condition

谁说胖子不能爱 提交于 2020-05-29 12:29:40
问题 The most similar question I found was here but with no proper answer. Basically I have an issue where I'm trying to use groupby on a dataframe to generate unique IDs for bus routes. The problem is, the data I have at my disposal sometimes (though rarely) has the same values for my groupby columns, so they're considered the same bus even though they aren't. The only other way I can think of is to group buses based on another column called "Type of stop", where there is an indicator for Start,

Pandas DataFrame groupby based on condition

天涯浪子 提交于 2020-05-29 12:27:09
问题 The most similar question I found was here but with no proper answer. Basically I have an issue where I'm trying to use groupby on a dataframe to generate unique IDs for bus routes. The problem is, the data I have at my disposal sometimes (though rarely) has the same values for my groupby columns, so they're considered the same bus even though they aren't. The only other way I can think of is to group buses based on another column called "Type of stop", where there is an indicator for Start,

Pandas datetime week not as expected

本秂侑毒 提交于 2020-05-26 04:01:09
问题 When working with Pandas datetimes, I'm trying to group data by the week and year. However, I have noticed some years where the last day of the year ends up grouped with the first week of the same year. import pandas as pd day_df = pd.DataFrame(index=pd.date_range('2016-01-01', '2020-12-31')) for (week, year), subset in day_df.groupby([day_df.index.week, day_df.index.year]): if week == 1: print('Week:', subset.index.min(), subset.index.max()) Week: 1 2016-01-04 00:00:00 2016-01-10 00:00:00

Including the group name in the apply function pandas python

早过忘川 提交于 2020-05-25 09:35:14
问题 Is there away to specify to the groupby() call to use the group name in the apply() lambda function? Similar to if I iterate through groups I can get the group key via the following tuple decomposition: for group_name, subdf in temp_dataframe.groupby(level=0, axis=0): print group_name ...is there a way to also get the group name in the apply function, such as: temp_dataframe.groupby(level=0,axis=0).apply(lambda group_name, subdf: foo(group_name, subdf) How can I get the group name as an

Including the group name in the apply function pandas python

心已入冬 提交于 2020-05-25 09:34:35
问题 Is there away to specify to the groupby() call to use the group name in the apply() lambda function? Similar to if I iterate through groups I can get the group key via the following tuple decomposition: for group_name, subdf in temp_dataframe.groupby(level=0, axis=0): print group_name ...is there a way to also get the group name in the apply function, such as: temp_dataframe.groupby(level=0,axis=0).apply(lambda group_name, subdf: foo(group_name, subdf) How can I get the group name as an

Python Dataframe: Calculating R^2 and RMSE Using Groupby on One Column

梦想的初衷 提交于 2020-05-25 05:17:46
问题 I have the following Python dataframe: Type Actual Predicted A 4 3 A 10 18 A 13 11 B 3 10 B 4 2 B 8 33 C 20 17 C 40 33 C 87 80 C 32 30 I have the code to calculate R^2 and RMSE but I don't know how to calculate it by distinct "Type". For now, my methodology is breaking the larger table into three smaller tables consisting of only A, B, C values and then calculating R^2 and RMSE off each smaller table...then appending them back together. But the above method is inefficient and I believe there

Python Dataframe: Calculating R^2 and RMSE Using Groupby on One Column

谁说胖子不能爱 提交于 2020-05-25 05:17:05
问题 I have the following Python dataframe: Type Actual Predicted A 4 3 A 10 18 A 13 11 B 3 10 B 4 2 B 8 33 C 20 17 C 40 33 C 87 80 C 32 30 I have the code to calculate R^2 and RMSE but I don't know how to calculate it by distinct "Type". For now, my methodology is breaking the larger table into three smaller tables consisting of only A, B, C values and then calculating R^2 and RMSE off each smaller table...then appending them back together. But the above method is inefficient and I believe there

Applying a custom groupby aggregate function to find average of Numpy Array

喜欢而已 提交于 2020-05-21 07:35:45
问题 I am having a pandas DataFrame where B contains NumPy list of fixed size. |------|---------------|-------| | A | B | C | |------|---------------|-------| | 0 | [2,3,5,6] | X | |------|---------------|-------| | 1 | [1,2,3,4] | X | |------|---------------|-------| | 2 | [2,3,6,5] | Y | |------|---------------|-------| | 3 | [2,3,2,3] | Y | |------|---------------|-------| | 4 | [2,3,4,4] | Y | |------|---------------|-------| | 5 | [2,3,5,6] | Z | |------|---------------|-------| I want to

Groupby of multiple columns and assigning values to each by considering start and end of each (Pandas)

岁酱吖の 提交于 2020-05-17 07:04:41
问题 I've got a datframe that looks like that df1 v w x y 4 0 1 a b 5 0 1 a a _________________ 6 0 2 a b _________________ 2 0 3 a b - - - - - - - - - 3 1 2 a b _________________ 15 1 3 a b 12 1 3 b b _________________ 13 1 1 a b - - - - - - - - - 15 3 1 a b 14 3 1 b a 8 3 1 a b 9 3 1 a a so df1 were grouped (lines) by v and w and merged with another df which contained x and y. I need a new column z which picks the right group out of x and y with the following conditions: in Every subgroup 'V'

create a new column based on groupby date time column at date level in pandas

亡梦爱人 提交于 2020-05-15 19:12:23
问题 I have data frame as shown below. Doctor Appointment Booking_ID A 2020-01-18 12:00:00 1 A 2020-01-18 12:30:00 2 A 2020-01-18 13:00:00 3 A 2020-01-18 13:00:00 4 A 2020-01-19 13:00:00 13 A 2020-01-19 13:30:00 14 B 2020-01-18 12:00:00 5 B 2020-01-18 12:30:00 6 B 2020-01-18 13:00:00 7 B 2020-01-25 12:30:00 6 B 2020-01-25 13:00:00 7 C 2020-01-19 12:00:00 19 C 2020-01-19 12:30:00 20 C 2020-01-19 13:00:00 21 C 2020-01-22 12:30:00 20 C 2020-01-22 13:00:00 21 From the above I would like to create a