pandas-groupby | 易学教程

Calculate nunique() for groupby in pandas

阅读更多关于 Calculate nunique() for groupby in pandas

问题 I have a dataframe with columns: diff - difference between registration date and payment date,in days country - country of user user_id campaign_id -- another categorical column, we will use it in groupby I need to calculate count distinct users for every country + campaign_id group who has diff <=n. For example, for country 'A', campaign 'abc' and diff 7 i need to get count distinct users from country 'A', campaign 'abc' and diff <= 7 My current solution(below) works too long import pandas

Calculate nunique() for groupby in pandas

阅读更多关于 Calculate nunique() for groupby in pandas

How to do a conditional count after groupby on a Pandas Dataframe?

阅读更多关于 How to do a conditional count after groupby on a Pandas Dataframe?

问题 I have the following dataframe: key1 key2 0 a one 1 a two 2 b one 3 b two 4 a one 5 c two Now, I want to group the dataframe by the key1 and count the column key2 with the value "one" to get this result: key1 0 a 2 1 b 1 2 c 0 I just get the usual count with: df.groupby(['key1']).size() But I don't know how to insert the condition. I tried things like this: df.groupby(['key1']).apply(df[df['key2'] == 'one']) But I can't get any further. How can I do this? 回答1: I think you need add condition

Replace value with a condition from 2 columns using pandas

阅读更多关于 Replace value with a condition from 2 columns using pandas

问题 I have a pandas data-frame like as shown below df1_new = pd.DataFrame({'person_id': [1, 2, 3, 4, 5], 'start_date': ['07/23/2377', '05/29/2477', '02/03/2177', '7/27/2277', '7/13/2077'], 'start_datetime': ['07/23/2377 12:00:00', '05/29/2477 04:00:00', '02/03/2177 02:00:00', '7/27/2277 05:00:00', '7/13/2077 12:00:00'], 'end_date': ['07/25/2377', '06/09/2477', '02/05/2177', '01/01/2000', '01/01/2000'], 'end_datetime': ['07/25/2377 02:00:00', '06/09/2477 04:00:00', '02/05/2177 01:00:00', '01/01

create new rows based on values of one of the columns in the above row with specific condition - pandas or numpy

阅读更多关于 create new rows based on values of one of the columns in the above row with specific condition - pandas or numpy

问题 I have a data frame as shown below B_ID no_show Session slot_num walkin ns_w c_ns_w c_walkin 1 0.4 S1 1 0.2 0.2 0.2 0.2 2 0.3 S1 2 0.5 -0.2 0.2 0.7 3 0.8 S1 3 0.5 0.3 0.5 1.2 4 0.3 S1 4 0.8 -0.5 0.0 2.0 5 0.6 S1 5 0.4 0.2 0.2 2.4 6 0.8 S1 6 0.2 0.6 0.8 2.6 7 0.9 S1 7 0.1 0.8 1.4 2.7 8 0.4 S1 8 0.5 -0.1 1.3 3.2 9 0.6 S1 9 0.1 0.5 1.8 3.3 12 0.9 S2 1 0.9 0.0 0.0 0.9 13 0.5 S2 2 0.4 0.1 0.1 1.3 14 0.3 S2 3 0.1 0.2 0.3 1.4 15 0.7 S2 4 0.4 0.3 0.6 1.8 20 0.7 S2 5 0.1 0.6 1.2 1.9 16 0.6 S2 6 0.3 0

Pandas: groupby column A and make lists of tuples from other columns?

阅读更多关于 Pandas: groupby column A and make lists of tuples from other columns?

问题 I would like to aggregate user transactions into lists in pandas. I can't figure out how to make a list comprised of more than one field. For example, df = pd.DataFrame({'user':[1,1,2,2,3], 'time':[20,10,11,18, 15], 'amount':[10.99, 4.99, 2.99, 1.99, 10.99]}) which looks like amount time user 0 10.99 20 1 1 4.99 10 1 2 2.99 11 2 3 1.99 18 2 4 10.99 15 3 If I do print(df.groupby('user')['time'].apply(list)) I get user 1 [20, 10] 2 [11, 18] 3 [15] but if I do df.groupby('user')[['time', 'amount

create new rows based the values of one of the column in pandas or numpy

阅读更多关于 create new rows based the values of one of the column in pandas or numpy

问题 I have a data frame as shown below. which is doctors appointment data. B_ID No_Show Session slot_num Cumulative_no_show 1 0.4 S1 1 0.4 2 0.3 S1 2 0.7 3 0.8 S1 3 1.5 4 0.3 S1 4 1.8 5 0.6 S1 5 2.4 6 0.8 S1 6 3.2 7 0.9 S1 7 4.1 8 0.4 S1 8 4.5 9 0.6 S1 9 5.1 12 0.9 S2 1 0.9 13 0.5 S2 2 1.4 14 0.3 S2 3 1.7 15 0.7 S2 4 2.4 20 0.7 S2 5 3.1 16 0.6 S2 6 3.7 17 0.8 S2 7 4.5 19 0.3 S2 8 4.8 From the above when ever u_cumulative > 0.8 create a new row just below that with No_Show = 0.0 and its Session

How to remove some rows in a group by in python

阅读更多关于 How to remove some rows in a group by in python

问题 I'm having a dataframe and I'd like to do a groupby() based a column and then sort the values within each group based on a date column. Then, from a each I'd like to remove records whose value for column_condition == 'B' until I reach to a row whose column_condition == 'A' . For example, Assume the table below is one of the groups ID, DATE, column_condition -------------------------- 1, jan 2017, B 1, Feb 2017, B 1, Mar 2017, B 1, Aug 2017, A 1, Sept 2017, B So, I'd like to remove the first

pandas groupby count and then conditional mean

阅读更多关于 pandas groupby count and then conditional mean

问题 I have a dataframe like this: col1 col2 0 a 100 1 a 200 2 a 150 3 b 1000 4 c 400 5 c 200 what I want to do is group by col1 and count the number of occurrences and if count is equal or greater than 2, then calculate mean of col2 for those rows and if not returns null. The output should be: col1 mean 0 a 150 1 b 2 c 300 回答1: Use groupby.mean + DataFrame.where with Series.value_counts: df.groupby('col1').mean().where(df['col1'].value_counts().ge(2)).reset_index() #you can select columns you

Pandas - Count frequency of value for last x amount of days

阅读更多关于 Pandas - Count frequency of value for last x amount of days

问题 I'm finding some unexpected results. What I am trying to do is create a column that looks at the ID number and the date, and will count how many times that ID number comes up in the last 7 days (I'd also like to make that dynamic for an x amount of days, but just trying out with 7 days). So given this dataframe: import pandas as pd df = pd.DataFrame( [['A', '2020-02-02 20:31:00'], ['A', '2020-02-03 00:52:00'], ['A', '2020-02-07 23:45:00'], ['A', '2020-02-08 13:19:00'], ['A', '2020-02-18 13:16