pandas-groupby

Groupby filter, based on consecutive sequence sorted and ID and Date column

白昼怎懂夜的黑 提交于 2019-12-02 10:42:31
I have a dataframe as shown below ID Status Date 0 1 F 2017-06-22 1 1 M 2017-07-22 2 1 P 2017-10-22 3 1 F 2018-06-22 4 1 P 2018-08-22 5 1 F 2018-10-22 6 1 F 2019-03-22 7 2 M 2017-06-29 8 2 F 2017-09-29 9 2 F 2018-01-29 10 2 M 2018-03-29 11 2 P 2018-08-29 12 2 M 2018-10-29 13 2 F 2018-12-29 14 3 M 2017-03-20 15 3 F 2018-06-20 16 3 P 2018-08-20 17 3 M 2018-10-20 18 3 F 2018-11-20 19 3 P 2018-12-20 20 3 F 2019-03-20 22 4 M 2017-08-10 23 4 F 2018-06-10 24 4 P 2018-08-10 25 4 F 2018-12-10 26 4 M 2019-01-10 27 4 F 2019-06-10 31 7 M 2017-08-10 32 7 F 2018-04-10 33 7 P 2018-08-10 34 7 F 2018-11-10 33

Python pandas rank/sort based on another column that differs for each input

烂漫一生 提交于 2019-12-02 08:52:34
问题 I would like to come up with the 4th column below based on the first three: user job time Rank A print 1559 2 A print 1540 2 A edit 1520 1 A edit 1523 1 A deliver 9717 3 B edit 1717 2 B edit 1716 2 B edit 1715 2 B deliver 1527 1 B deliver 1524 1 The ranking in the 4th columns is independent for each user (1st column). For each user, I would like to rank the second column based on the value of the 3rd column. Eg. for user A, s/he has three jobs to be ranks. Because the time value of 'edit' is

groupby counter of rows

守給你的承諾、 提交于 2019-12-02 07:41:06
I am trying to create a new variable which counts how many times had been seen the same id over time. Need to pass from this dataframe id clae6 year quarter 1 475230.0 2007 1 1 475230.0 2007 2 1 475230.0 2007 3 1 475230.0 2007 4 1 475230.0 2008 1 1 475230.0 2008 2 2 475230.0 2007 1 2 475230.0 2007 2 2 475230.0 2007 3 2 475230.0 2007 4 2 475230.0 2008 1 3 475230.0 2010 1 3 475230.0 2010 2 3 475230.0 2010 3 3 475230.0 2010 4 to this id clae6 year quarter new_variable 1 475230.0 2007 1 1 1 475230.0 2007 2 2 1 475230.0 2007 3 3 1 475230.0 2007 4 4 1 475230.0 2008 1 5 1 475230.0 2008 2 6 2 475230.0

How to label groups of pairs in pandas?

天大地大妈咪最大 提交于 2019-12-02 07:26:49
I have this dataframe: >>> df = pd.DataFrame({'A': [1, 2, 1, np.nan, 2, 2, 2], 'B': [2, 1, 2, 2.0, 1, 1, 2]}) >>> df A B 0 1.0 2.0 1 2.0 1.0 2 1.0 2.0 3 NaN 2.0 4 2.0 1.0 5 2.0 1.0 6 2.0 2.0 I need to identify the groups of pairs (A,B) on a third column "group id", to get something like this: >>> df A B grup id explanation 0 1.0 2.0 1.0 <- group (1.0, 2.0), first group 1 2.0 1.0 2.0 <- group (2.0, 1.0), second group 2 1.0 2.0 1.0 <- group (1.0, 2.0), first group 3 NaN 2.0 NaN <- invalid group 4 2.0 1.0 2.0 <- group (2.0, 1.0), second group 5 2.0 1.0 2.0 <- group (2.0, 1.0), second group 6 2.0

Mixing aggregation and group by in pandas

落爺英雄遲暮 提交于 2019-12-02 05:45:15
问题 What I have is a data set called 'report' which has details of delivery drivers. 'Pass' means they delivered on time and 'Fail' means they didn't Name|Outcome A |Pass B |Fail C |Pass D |Pass A |Fail C |Pass What I want Name|Pass|Fail|Total A |1 |1 |2 B |0 |1 |1 C |2 |0 |2 D |1 |0 |1 I tried: report.groupby(['Name','outcome']).agg(['count']) but it is not giving me the required output Many Thanks 回答1: Use crosstab with margins=True and margins_name parameter: print (pd.crosstab(df['Name'], df[

Pandas groupby - set of different values

夙愿已清 提交于 2019-12-02 05:36:11
问题 I have this dataframe x = pd.DataFrame.from_dict({'cat1':['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'], 'cat2':['X', 'X', 'Y', 'Y', 'Y', 'Y', 'Z', 'Z']}) cat1 cat2 0 A X 1 A X 2 A Y 3 B Y 4 B Y 5 C Y 6 C Z 7 C Z I want to group by cat1 , and then aggregate cat2 as sets of different values, such as cat1 cat2 0 A (X, Y) 1 B (Y,) 2 C (Y, Z) This is part of a bigger dataframe with more columns, each of which has its own aggregation function, so how do I pass this functionality to the aggregation

Cannot get groupby records based on their minimum value using pandas in python

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-02 03:16:11
问题 I have the following csv id;price;editor k1;10,00;ed1 k1;8,00;ed2 k3;10,00;ed1 k3;11,00;ed2 k2;10,50;ed1 k1;9,50;ed3 If I do the following import pandas as pd df = pd.read_csv('Testing.csv', delimiter =';') df_reduced= df.groupby(['id', 'editor'])['price'].min() Instead of getting k1;8,00;ed2 k2;10,50;ed1 k3;10,00;ed1 I get k1;10,00;ed1 8,00;ed2 9,50;ed3 k2;10,50;ed1 k3;10,00;ed1 11,00;ed2 So can I get three id's with their minimum values? 回答1: Group the data by only id and find min price for

Pandas find duration between dates where a condition is met?

孤者浪人 提交于 2019-12-02 02:43:40
问题 I have a pandas DataFrame that looks like this: ╔═══╦════════════╦═════════════╗ ║ ║ VENDOR ID ║ DATE ║ ╠═══╬════════════╬═════════════╣ ║ 1 ║ 33 ║ 01/12/2018 ║ ║ 2 ║ 33 ║ 03/12/2018 ║ ║ 3 ║ 12 ║ 01/08/2018 ║ ║ 4 ║ 12 ║ 01/15/2018 ║ ║ 5 ║ 12 ║ 01/23/2018 ║ ║ 6 ║ 33 ║ 05/12/2018 ║ ║ 7 ║ 89 ║ 01/12/2018 ║ ╚═══╩════════════╩═════════════╝ And I'm hoping to get a table that gives me the number of days since the same VENDOR ID last occured, like so: ╔═══╦════════════╦═════════════╗ ║ ║ VENDOR ID ║

Pandas find duration between dates where a condition is met?

强颜欢笑 提交于 2019-12-02 01:41:48
I have a pandas DataFrame that looks like this: ╔═══╦════════════╦═════════════╗ ║ ║ VENDOR ID ║ DATE ║ ╠═══╬════════════╬═════════════╣ ║ 1 ║ 33 ║ 01/12/2018 ║ ║ 2 ║ 33 ║ 03/12/2018 ║ ║ 3 ║ 12 ║ 01/08/2018 ║ ║ 4 ║ 12 ║ 01/15/2018 ║ ║ 5 ║ 12 ║ 01/23/2018 ║ ║ 6 ║ 33 ║ 05/12/2018 ║ ║ 7 ║ 89 ║ 01/12/2018 ║ ╚═══╩════════════╩═════════════╝ And I'm hoping to get a table that gives me the number of days since the same VENDOR ID last occured, like so: ╔═══╦════════════╦═════════════╗ ║ ║ VENDOR ID ║ GAP ║ ╠═══╬════════════╬═════════════╣ ║ 1 ║ 33 ║ ---------- ║ ║ 2 ║ 33 ║ 60 ║ ║ 3 ║ 12 ║ ---------- ║

Pandas groupby - set of different values

試著忘記壹切 提交于 2019-12-02 01:28:51
I have this dataframe x = pd.DataFrame.from_dict({'cat1':['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'], 'cat2':['X', 'X', 'Y', 'Y', 'Y', 'Y', 'Z', 'Z']}) cat1 cat2 0 A X 1 A X 2 A Y 3 B Y 4 B Y 5 C Y 6 C Z 7 C Z I want to group by cat1 , and then aggregate cat2 as sets of different values, such as cat1 cat2 0 A (X, Y) 1 B (Y,) 2 C (Y, Z) This is part of a bigger dataframe with more columns, each of which has its own aggregation function, so how do I pass this functionality to the aggregation dictionary? Use lambda function with set or unique , also convert output to tuple s: x = pd.DataFrame.from