pandas-groupby

Summary statistics for each group and transpose using pandas

青春壹個敷衍的年華 提交于 2021-01-01 09:03:14
问题 I have a dataframe like as shown below df = pd.DataFrame({'person_id': [11,11,11,11,11,11,11,11,12,12,12], 'time' :[0,0,0,1,2,3,4,4,0,0,1], 'value':[101,102,np.nan,120,143,153,160,170,96,97,99]}) What I would like to do is a) Get the summary statistics for each subject for each time point (ex: 0hr, 1hr, 2hr etc) b) Please note that NA rows shouldn't be counted as separate record/row during computing mean I was trying the below for i in df['subject_id'].unique() df[df['subject_id'].isin([i])]

Use cumcount on pandas dataframe with a conditional increment

主宰稳场 提交于 2020-12-31 14:19:24
问题 Consider the dataframe df = pd.DataFrame( [ ['A', 1], ['A', 1], ['B', 1], ['B', 0], ['A', 0], ['A', 1], ['B', 1] ], columns = ['key', 'cond']) I want to find a cumulative (running) count (starting at 1) for each key , where we only increment if the previous value in the group had cond == 1 . When appended to the above dataframe this would give df_result = pd.DataFrame( [ ['A', 1, 1], ['A', 1, 2], ['B', 1, 1], ['B', 0, 2], ['A', 0, 3], ['A', 1, 3], ['B', 1, 2] ], columns = ['key', 'cond'])

Use cumcount on pandas dataframe with a conditional increment

风格不统一 提交于 2020-12-31 14:18:25
问题 Consider the dataframe df = pd.DataFrame( [ ['A', 1], ['A', 1], ['B', 1], ['B', 0], ['A', 0], ['A', 1], ['B', 1] ], columns = ['key', 'cond']) I want to find a cumulative (running) count (starting at 1) for each key , where we only increment if the previous value in the group had cond == 1 . When appended to the above dataframe this would give df_result = pd.DataFrame( [ ['A', 1, 1], ['A', 1, 2], ['B', 1, 1], ['B', 0, 2], ['A', 0, 3], ['A', 1, 3], ['B', 1, 2] ], columns = ['key', 'cond'])

Use cumcount on pandas dataframe with a conditional increment

别来无恙 提交于 2020-12-31 14:17:10
问题 Consider the dataframe df = pd.DataFrame( [ ['A', 1], ['A', 1], ['B', 1], ['B', 0], ['A', 0], ['A', 1], ['B', 1] ], columns = ['key', 'cond']) I want to find a cumulative (running) count (starting at 1) for each key , where we only increment if the previous value in the group had cond == 1 . When appended to the above dataframe this would give df_result = pd.DataFrame( [ ['A', 1, 1], ['A', 1, 2], ['B', 1, 1], ['B', 0, 2], ['A', 0, 3], ['A', 1, 3], ['B', 1, 2] ], columns = ['key', 'cond'])

Pandas - Column expansion of List of Dictionary - How to Optimise?

♀尐吖头ヾ 提交于 2020-12-31 06:51:10
问题 I have a dataframe test with 3 columns id, name, value the following column test['values'] . A sample of how test looks is: name values 0 impressions [{'value': 17686, 'end_time': '2018-06-12T07:0... 1 reach [{'value': 6294, 'end_time': '2018-06-12T07:00... 2 follower_count [{'value': 130, 'end_time': '2018-06-12T07:00:... 3 email_contacts [{'value': 1, 'end_time': '2018-06-12T07:00:00... 4 phone_call_clicks [{'value': 0, 'end_time': '2018-06-12T07:00:00... 5 text_message_clicks [{'value': 0,

Pandas - Column expansion of List of Dictionary - How to Optimise?

大兔子大兔子 提交于 2020-12-31 06:50:13
问题 I have a dataframe test with 3 columns id, name, value the following column test['values'] . A sample of how test looks is: name values 0 impressions [{'value': 17686, 'end_time': '2018-06-12T07:0... 1 reach [{'value': 6294, 'end_time': '2018-06-12T07:00... 2 follower_count [{'value': 130, 'end_time': '2018-06-12T07:00:... 3 email_contacts [{'value': 1, 'end_time': '2018-06-12T07:00:00... 4 phone_call_clicks [{'value': 0, 'end_time': '2018-06-12T07:00:00... 5 text_message_clicks [{'value': 0,

Pandas group by with multiple columns and max value

↘锁芯ラ 提交于 2020-12-13 07:19:25
问题 I have some problems with group by with multiple columns and max value. A B C D E F G H x q e m k 2 1 y x q e n l 5 2 y x w e b j 7 3 y x w e v h 3 4 y This query is correct and returning what I want. SELECT A, B, C, D, E, MAX(F) FROM mytable group by A, B, C Results x q e n l 5 x w e b j 7 How it can be achieved in pandas? I try this: df.groupby(['A', 'B', 'C'], as_index=False)['F'].max() And this translates to this: SELECT A, B, C, MAX(F) FROM mytable group by A, B, C This also does not

Pandas group by with multiple columns and max value

落爺英雄遲暮 提交于 2020-12-13 07:19:05
问题 I have some problems with group by with multiple columns and max value. A B C D E F G H x q e m k 2 1 y x q e n l 5 2 y x w e b j 7 3 y x w e v h 3 4 y This query is correct and returning what I want. SELECT A, B, C, D, E, MAX(F) FROM mytable group by A, B, C Results x q e n l 5 x w e b j 7 How it can be achieved in pandas? I try this: df.groupby(['A', 'B', 'C'], as_index=False)['F'].max() And this translates to this: SELECT A, B, C, MAX(F) FROM mytable group by A, B, C This also does not

Groupby every 2 hours data of a dataframe

六月ゝ 毕业季﹏ 提交于 2020-12-12 07:06:04
问题 I have a dataframe: Time T201FN1ST2010 T201FN1VT2010 1791 2017-12-26 00:00:00 854.69 0.87 1792 2017-12-26 00:20:00 855.76 0.87 1793 2017-12-26 00:40:00 854.87 0.87 1794 2017-12-26 01:00:00 855.51 0.87 1795 2017-12-26 01:20:00 856.35 0.86 1796 2017-12-26 01:40:00 856.13 0.86 1797 2017-12-26 02:00:00 855.84 0.85 1798 2017-12-26 02:20:00 856.58 0.85 1799 2017-12-26 02:40:00 856.37 0.85 1800 2017-12-26 03:00:00 855.35 0.86 1801 2017-12-26 03:20:00 855.68 0.86 1802 2017-12-26 03:40:00 855.45 0.85

How can I Group By Month from a Date field using Python/Pandas

不问归期 提交于 2020-11-30 04:58:23
问题 I have a Data-frame df which is as follows: | date | Revenue | |-----------|---------| | 6/2/2017 | 100 | | 5/23/2017 | 200 | | 5/20/2017 | 300 | | 6/22/2017 | 400 | | 6/21/2017 | 500 | I need to group the above data by month to get output as: | date | SUM(Revenue) | |------|--------------| | May | 500 | | June | 1000 | I tried this code but it did not work: df.groupby(month('date')).agg({'Revenue': 'sum'}) I want to only use Pandas or Numpy and no additional libraries 回答1: try this: In [6]: