pandas-groupby

Calculate nunique() for groupby in pandas

流过昼夜 提交于 2020-05-15 02:18:08
问题 I have a dataframe with columns: diff - difference between registration date and payment date,in days country - country of user user_id campaign_id -- another categorical column, we will use it in groupby I need to calculate count distinct users for every country + campaign_id group who has diff <=n. For example, for country 'A', campaign 'abc' and diff 7 i need to get count distinct users from country 'A', campaign 'abc' and diff <= 7 My current solution(below) works too long import pandas

Calculate nunique() for groupby in pandas

末鹿安然 提交于 2020-05-15 02:16:47
问题 I have a dataframe with columns: diff - difference between registration date and payment date,in days country - country of user user_id campaign_id -- another categorical column, we will use it in groupby I need to calculate count distinct users for every country + campaign_id group who has diff <=n. For example, for country 'A', campaign 'abc' and diff 7 i need to get count distinct users from country 'A', campaign 'abc' and diff <= 7 My current solution(below) works too long import pandas

How to do a conditional count after groupby on a Pandas Dataframe?

耗尽温柔 提交于 2020-05-09 17:56:10
问题 I have the following dataframe: key1 key2 0 a one 1 a two 2 b one 3 b two 4 a one 5 c two Now, I want to group the dataframe by the key1 and count the column key2 with the value "one" to get this result: key1 0 a 2 1 b 1 2 c 0 I just get the usual count with: df.groupby(['key1']).size() But I don't know how to insert the condition. I tried things like this: df.groupby(['key1']).apply(df[df['key2'] == 'one']) But I can't get any further. How can I do this? 回答1: I think you need add condition

Replace value with a condition from 2 columns using pandas

僤鯓⒐⒋嵵緔 提交于 2020-05-09 16:19:06
问题 I have a pandas data-frame like as shown below df1_new = pd.DataFrame({'person_id': [1, 2, 3, 4, 5], 'start_date': ['07/23/2377', '05/29/2477', '02/03/2177', '7/27/2277', '7/13/2077'], 'start_datetime': ['07/23/2377 12:00:00', '05/29/2477 04:00:00', '02/03/2177 02:00:00', '7/27/2277 05:00:00', '7/13/2077 12:00:00'], 'end_date': ['07/25/2377', '06/09/2477', '02/05/2177', '01/01/2000', '01/01/2000'], 'end_datetime': ['07/25/2377 02:00:00', '06/09/2477 04:00:00', '02/05/2177 01:00:00', '01/01

create new rows based on values of one of the columns in the above row with specific condition - pandas or numpy

北城余情 提交于 2020-05-09 07:05:12
问题 I have a data frame as shown below B_ID no_show Session slot_num walkin ns_w c_ns_w c_walkin 1 0.4 S1 1 0.2 0.2 0.2 0.2 2 0.3 S1 2 0.5 -0.2 0.2 0.7 3 0.8 S1 3 0.5 0.3 0.5 1.2 4 0.3 S1 4 0.8 -0.5 0.0 2.0 5 0.6 S1 5 0.4 0.2 0.2 2.4 6 0.8 S1 6 0.2 0.6 0.8 2.6 7 0.9 S1 7 0.1 0.8 1.4 2.7 8 0.4 S1 8 0.5 -0.1 1.3 3.2 9 0.6 S1 9 0.1 0.5 1.8 3.3 12 0.9 S2 1 0.9 0.0 0.0 0.9 13 0.5 S2 2 0.4 0.1 0.1 1.3 14 0.3 S2 3 0.1 0.2 0.3 1.4 15 0.7 S2 4 0.4 0.3 0.6 1.8 20 0.7 S2 5 0.1 0.6 1.2 1.9 16 0.6 S2 6 0.3 0

Pandas: groupby column A and make lists of tuples from other columns?

*爱你&永不变心* 提交于 2020-04-29 06:54:12
问题 I would like to aggregate user transactions into lists in pandas. I can't figure out how to make a list comprised of more than one field. For example, df = pd.DataFrame({'user':[1,1,2,2,3], 'time':[20,10,11,18, 15], 'amount':[10.99, 4.99, 2.99, 1.99, 10.99]}) which looks like amount time user 0 10.99 20 1 1 4.99 10 1 2 2.99 11 2 3 1.99 18 2 4 10.99 15 3 If I do print(df.groupby('user')['time'].apply(list)) I get user 1 [20, 10] 2 [11, 18] 3 [15] but if I do df.groupby('user')[['time', 'amount

create new rows based the values of one of the column in pandas or numpy

岁酱吖の 提交于 2020-04-28 20:25:28
问题 I have a data frame as shown below. which is doctors appointment data. B_ID No_Show Session slot_num Cumulative_no_show 1 0.4 S1 1 0.4 2 0.3 S1 2 0.7 3 0.8 S1 3 1.5 4 0.3 S1 4 1.8 5 0.6 S1 5 2.4 6 0.8 S1 6 3.2 7 0.9 S1 7 4.1 8 0.4 S1 8 4.5 9 0.6 S1 9 5.1 12 0.9 S2 1 0.9 13 0.5 S2 2 1.4 14 0.3 S2 3 1.7 15 0.7 S2 4 2.4 20 0.7 S2 5 3.1 16 0.6 S2 6 3.7 17 0.8 S2 7 4.5 19 0.3 S2 8 4.8 From the above when ever u_cumulative > 0.8 create a new row just below that with No_Show = 0.0 and its Session

How to remove some rows in a group by in python

送分小仙女□ 提交于 2020-04-18 09:55:07
问题 I'm having a dataframe and I'd like to do a groupby() based a column and then sort the values within each group based on a date column. Then, from a each I'd like to remove records whose value for column_condition == 'B' until I reach to a row whose column_condition == 'A' . For example, Assume the table below is one of the groups ID, DATE, column_condition -------------------------- 1, jan 2017, B 1, Feb 2017, B 1, Mar 2017, B 1, Aug 2017, A 1, Sept 2017, B So, I'd like to remove the first

pandas groupby count and then conditional mean

寵の児 提交于 2020-04-16 05:44:07
问题 I have a dataframe like this: col1 col2 0 a 100 1 a 200 2 a 150 3 b 1000 4 c 400 5 c 200 what I want to do is group by col1 and count the number of occurrences and if count is equal or greater than 2, then calculate mean of col2 for those rows and if not returns null. The output should be: col1 mean 0 a 150 1 b 2 c 300 回答1: Use groupby.mean + DataFrame.where with Series.value_counts: df.groupby('col1').mean().where(df['col1'].value_counts().ge(2)).reset_index() #you can select columns you

Pandas - Count frequency of value for last x amount of days

别等时光非礼了梦想. 提交于 2020-04-15 10:48:49
问题 I'm finding some unexpected results. What I am trying to do is create a column that looks at the ID number and the date, and will count how many times that ID number comes up in the last 7 days (I'd also like to make that dynamic for an x amount of days, but just trying out with 7 days). So given this dataframe: import pandas as pd df = pd.DataFrame( [['A', '2020-02-02 20:31:00'], ['A', '2020-02-03 00:52:00'], ['A', '2020-02-07 23:45:00'], ['A', '2020-02-08 13:19:00'], ['A', '2020-02-18 13:16