pandas-groupby | 易学教程

Groupby filter, based on consecutive sequence sorted and ID and Date column

阅读更多关于 Groupby filter, based on consecutive sequence sorted and ID and Date column

I have a dataframe as shown below ID Status Date 0 1 F 2017-06-22 1 1 M 2017-07-22 2 1 P 2017-10-22 3 1 F 2018-06-22 4 1 P 2018-08-22 5 1 F 2018-10-22 6 1 F 2019-03-22 7 2 M 2017-06-29 8 2 F 2017-09-29 9 2 F 2018-01-29 10 2 M 2018-03-29 11 2 P 2018-08-29 12 2 M 2018-10-29 13 2 F 2018-12-29 14 3 M 2017-03-20 15 3 F 2018-06-20 16 3 P 2018-08-20 17 3 M 2018-10-20 18 3 F 2018-11-20 19 3 P 2018-12-20 20 3 F 2019-03-20 22 4 M 2017-08-10 23 4 F 2018-06-10 24 4 P 2018-08-10 25 4 F 2018-12-10 26 4 M 2019-01-10 27 4 F 2019-06-10 31 7 M 2017-08-10 32 7 F 2018-04-10 33 7 P 2018-08-10 34 7 F 2018-11-10 33

Python pandas rank/sort based on another column that differs for each input

阅读更多关于 Python pandas rank/sort based on another column that differs for each input

问题 I would like to come up with the 4th column below based on the first three: user job time Rank A print 1559 2 A print 1540 2 A edit 1520 1 A edit 1523 1 A deliver 9717 3 B edit 1717 2 B edit 1716 2 B edit 1715 2 B deliver 1527 1 B deliver 1524 1 The ranking in the 4th columns is independent for each user (1st column). For each user, I would like to rank the second column based on the value of the 3rd column. Eg. for user A, s/he has three jobs to be ranks. Because the time value of 'edit' is

groupby counter of rows

阅读更多关于 groupby counter of rows

I am trying to create a new variable which counts how many times had been seen the same id over time. Need to pass from this dataframe id clae6 year quarter 1 475230.0 2007 1 1 475230.0 2007 2 1 475230.0 2007 3 1 475230.0 2007 4 1 475230.0 2008 1 1 475230.0 2008 2 2 475230.0 2007 1 2 475230.0 2007 2 2 475230.0 2007 3 2 475230.0 2007 4 2 475230.0 2008 1 3 475230.0 2010 1 3 475230.0 2010 2 3 475230.0 2010 3 3 475230.0 2010 4 to this id clae6 year quarter new_variable 1 475230.0 2007 1 1 1 475230.0 2007 2 2 1 475230.0 2007 3 3 1 475230.0 2007 4 4 1 475230.0 2008 1 5 1 475230.0 2008 2 6 2 475230.0

How to label groups of pairs in pandas?

阅读更多关于 How to label groups of pairs in pandas?

I have this dataframe: >>> df = pd.DataFrame({'A': [1, 2, 1, np.nan, 2, 2, 2], 'B': [2, 1, 2, 2.0, 1, 1, 2]}) >>> df A B 0 1.0 2.0 1 2.0 1.0 2 1.0 2.0 3 NaN 2.0 4 2.0 1.0 5 2.0 1.0 6 2.0 2.0 I need to identify the groups of pairs (A,B) on a third column "group id", to get something like this: >>> df A B grup id explanation 0 1.0 2.0 1.0 <- group (1.0, 2.0), first group 1 2.0 1.0 2.0 <- group (2.0, 1.0), second group 2 1.0 2.0 1.0 <- group (1.0, 2.0), first group 3 NaN 2.0 NaN <- invalid group 4 2.0 1.0 2.0 <- group (2.0, 1.0), second group 5 2.0 1.0 2.0 <- group (2.0, 1.0), second group 6 2.0

Mixing aggregation and group by in pandas

阅读更多关于 Mixing aggregation and group by in pandas

问题 What I have is a data set called 'report' which has details of delivery drivers. 'Pass' means they delivered on time and 'Fail' means they didn't Name|Outcome A |Pass B |Fail C |Pass D |Pass A |Fail C |Pass What I want Name|Pass|Fail|Total A |1 |1 |2 B |0 |1 |1 C |2 |0 |2 D |1 |0 |1 I tried: report.groupby(['Name','outcome']).agg(['count']) but it is not giving me the required output Many Thanks 回答1: Use crosstab with margins=True and margins_name parameter: print (pd.crosstab(df['Name'], df[

Pandas groupby - set of different values

阅读更多关于 Pandas groupby - set of different values

问题 I have this dataframe x = pd.DataFrame.from_dict({'cat1':['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'], 'cat2':['X', 'X', 'Y', 'Y', 'Y', 'Y', 'Z', 'Z']}) cat1 cat2 0 A X 1 A X 2 A Y 3 B Y 4 B Y 5 C Y 6 C Z 7 C Z I want to group by cat1 , and then aggregate cat2 as sets of different values, such as cat1 cat2 0 A (X, Y) 1 B (Y,) 2 C (Y, Z) This is part of a bigger dataframe with more columns, each of which has its own aggregation function, so how do I pass this functionality to the aggregation

Cannot get groupby records based on their minimum value using pandas in python

阅读更多关于 Cannot get groupby records based on their minimum value using pandas in python

问题 I have the following csv id;price;editor k1;10,00;ed1 k1;8,00;ed2 k3;10,00;ed1 k3;11,00;ed2 k2;10,50;ed1 k1;9,50;ed3 If I do the following import pandas as pd df = pd.read_csv('Testing.csv', delimiter =';') df_reduced= df.groupby(['id', 'editor'])['price'].min() Instead of getting k1;8,00;ed2 k2;10,50;ed1 k3;10,00;ed1 I get k1;10,00;ed1 8,00;ed2 9,50;ed3 k2;10,50;ed1 k3;10,00;ed1 11,00;ed2 So can I get three id's with their minimum values? 回答1: Group the data by only id and find min price for

Pandas find duration between dates where a condition is met?

阅读更多关于 Pandas find duration between dates where a condition is met?

问题 I have a pandas DataFrame that looks like this: ╔═══╦════════════╦═════════════╗ ║ ║ VENDOR ID ║ DATE ║ ╠═══╬════════════╬═════════════╣ ║ 1 ║ 33 ║ 01/12/2018 ║ ║ 2 ║ 33 ║ 03/12/2018 ║ ║ 3 ║ 12 ║ 01/08/2018 ║ ║ 4 ║ 12 ║ 01/15/2018 ║ ║ 5 ║ 12 ║ 01/23/2018 ║ ║ 6 ║ 33 ║ 05/12/2018 ║ ║ 7 ║ 89 ║ 01/12/2018 ║ ╚═══╩════════════╩═════════════╝ And I'm hoping to get a table that gives me the number of days since the same VENDOR ID last occured, like so: ╔═══╦════════════╦═════════════╗ ║ ║ VENDOR ID ║

Pandas find duration between dates where a condition is met?

阅读更多关于 Pandas find duration between dates where a condition is met?

I have a pandas DataFrame that looks like this: ╔═══╦════════════╦═════════════╗ ║ ║ VENDOR ID ║ DATE ║ ╠═══╬════════════╬═════════════╣ ║ 1 ║ 33 ║ 01/12/2018 ║ ║ 2 ║ 33 ║ 03/12/2018 ║ ║ 3 ║ 12 ║ 01/08/2018 ║ ║ 4 ║ 12 ║ 01/15/2018 ║ ║ 5 ║ 12 ║ 01/23/2018 ║ ║ 6 ║ 33 ║ 05/12/2018 ║ ║ 7 ║ 89 ║ 01/12/2018 ║ ╚═══╩════════════╩═════════════╝ And I'm hoping to get a table that gives me the number of days since the same VENDOR ID last occured, like so: ╔═══╦════════════╦═════════════╗ ║ ║ VENDOR ID ║ GAP ║ ╠═══╬════════════╬═════════════╣ ║ 1 ║ 33 ║ ---------- ║ ║ 2 ║ 33 ║ 60 ║ ║ 3 ║ 12 ║ ---------- ║

Pandas groupby - set of different values

阅读更多关于 Pandas groupby - set of different values

I have this dataframe x = pd.DataFrame.from_dict({'cat1':['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'], 'cat2':['X', 'X', 'Y', 'Y', 'Y', 'Y', 'Z', 'Z']}) cat1 cat2 0 A X 1 A X 2 A Y 3 B Y 4 B Y 5 C Y 6 C Z 7 C Z I want to group by cat1 , and then aggregate cat2 as sets of different values, such as cat1 cat2 0 A (X, Y) 1 B (Y,) 2 C (Y, Z) This is part of a bigger dataframe with more columns, each of which has its own aggregation function, so how do I pass this functionality to the aggregation dictionary? Use lambda function with set or unique , also convert output to tuple s: x = pd.DataFrame.from