pandas-groupby

How to get number of groups in a groupby object in pandas?

只谈情不闲聊 提交于 2019-11-29 09:03:20
This would be useful so I know how many unique groups I have to perform calculations on. Thank you. Suppose groupby object is called dfgroup . As documented , you can get the number of groups with len(dfgroup) . As of v0.23, there are a multiple options to use. First, the setup, df = pd.DataFrame({'A': list('aabbcccd'), 'B': 'x'}) df A B 0 a x 1 a x 2 b x 3 b x 4 c x 5 c x 6 c x 7 d x g = df.groupby(['A']) 1) ngroups Newer versions of the groupby API provide this (undocumented) attribute which stores the number of groups in a GroupBy object. g.ngroups # 6 Note that this is different from

Pandas groupby to to_csv

和自甴很熟 提交于 2019-11-29 07:05:58
Want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked. Python 3.6.1, Pandas 0.20.1 groupby result looks like: id month year count week 0 9066 82 32142 895 1 7679 84 30112 749 2 8368 126 42187 872 3 11038 102 34165 976 4 8815 117 34122 767 5 10979 163 50225 1252 6 8726 142 38159 996 7 5568 63 26143 582 Want a csv that looks like week count 0 895 1 749 2 872 3 976 4 767 5 1252 6 996 7 582 Current code: week_grouped = df.groupby('week') week_grouped.sum() #At this point you have the groupby result week_grouped.to_csv('week_grouped.csv')

Groupby column and find min and max of each group

有些话、适合烂在心里 提交于 2019-11-29 05:19:33
I have the following dataset, Day Element Data_Value 6786 01-01 TMAX 112 9333 01-01 TMAX 101 9330 01-01 TMIN 60 11049 01-01 TMIN 0 6834 01-01 TMIN 25 11862 01-01 TMAX 113 1781 01-01 TMAX 115 11042 01-01 TMAX 105 1110 01-01 TMAX 111 651 01-01 TMIN 44 11350 01-01 TMIN 83 1798 01-02 TMAX 70 4975 01-02 TMAX 79 12774 01-02 TMIN 0 3977 01-02 TMIN 60 2485 01-02 TMAX 73 4888 01-02 TMIN 31 11836 01-02 TMIN 26 11368 01-02 TMAX 71 2483 01-02 TMIN 26 I want to group by the Day and then find the overall min of TMIN an the max of TMAX and put these in to a data frame, so I get an output like... Day DayMin

Regression by group in python pandas

拥有回忆 提交于 2019-11-29 02:35:33
I want to ask a quick question related to regression analysis in python pandas. So, assume that I have the following datasets: Group Y X 1 10 6 1 5 4 1 3 1 2 4 6 2 2 4 2 3 9 My aim is to run regression; Y is dependent and X is independent variable. The issue is I want to run this regression by Group and print the coefficients in a new data set. So, the results should be like: Group Coefficient 1 0.25 (lets assume that coefficient is 0.25) 2 0.30 I hope I can explain my question. Many thanks in advance for your help. I am not sure about the type of regression you need, but this is how you do an

Transform vs. aggregate in Pandas

故事扮演 提交于 2019-11-28 20:35:27
问题 When grouping a Pandas DataFrame, when should I use transform and when should I use aggregate ? How do they differ with respect to their application in practice and which one do you consider more important? 回答1: consider the dataframe df df = pd.DataFrame(dict(A=list('aabb'), B=[1, 2, 3, 4], C=[0, 9, 0, 9])) groupby is the standard use aggregater df.groupby('A').mean() maybe you want these values broadcast across the whole group and return something with the same index as what you started

Python - rolling functions for GroupBy object

邮差的信 提交于 2019-11-28 18:12:32
I have a time series object grouped of the type <pandas.core.groupby.SeriesGroupBy object at 0x03F1A9F0> . grouped.sum() gives the desired result but I cannot get rolling_sum to work with the groupby object. Is there any way to apply rolling functions to groupby objects? For example: x = range(0, 6) id = ['a', 'a', 'a', 'b', 'b', 'b'] df = DataFrame(zip(id, x), columns = ['id', 'x']) df.groupby('id').sum() id x a 3 b 12 However, I would like to have something like: id x 0 a 0 1 a 1 2 a 3 3 b 3 4 b 7 5 b 12 Note: as identified by @kekert, the following pandas pattern has been deprecated. See

Pandas groupby with bin counts

旧时模样 提交于 2019-11-28 17:30:38
问题 I have a DataFrame that looks like this: +----------+---------+-------+ | username | post_id | views | +----------+---------+-------+ | john | 1 | 3 | | john | 2 | 23 | | john | 3 | 44 | | john | 4 | 82 | | jane | 7 | 5 | | jane | 8 | 25 | | jane | 9 | 46 | | jane | 10 | 56 | +----------+---------+-------+ and I would like to transform it to count views that belong to certain bins like this: +------+------+-------+-------+--------+ | | 1-10 | 11-25 | 25-50 | 51-100 | +------+------+-------+--

How to label groups of pairs in pandas?

好久不见. 提交于 2019-11-28 11:42:58
问题 I have this dataframe: >>> df = pd.DataFrame({'A': [1, 2, 1, np.nan, 2, 2, 2], 'B': [2, 1, 2, 2.0, 1, 1, 2]}) >>> df A B 0 1.0 2.0 1 2.0 1.0 2 1.0 2.0 3 NaN 2.0 4 2.0 1.0 5 2.0 1.0 6 2.0 2.0 I need to identify the groups of pairs (A,B) on a third column "group id", to get something like this: >>> df A B grup id explanation 0 1.0 2.0 1.0 <- group (1.0, 2.0), first group 1 2.0 1.0 2.0 <- group (2.0, 1.0), second group 2 1.0 2.0 1.0 <- group (1.0, 2.0), first group 3 NaN 2.0 NaN <- invalid group

Dataframe merge gives `Process finished with exit code 137 (interrupted by signal 9: SIGKILL)`

自作多情 提交于 2019-11-28 11:05:38
问题 I use dataframe merge 3 times to get my desire results def write_dips(writer): df_dips = pd.read_excel(file_path, sheet_name='DipsSummary') df_sales = pd.read_excel(file_path, sheet_name='SaleSummary') df_delivery = pd.read_excel(file_path, sheet_name='DeliverySummary') df_mapping = pd.read_csv(mappingfilepath, delimiter=',', skiprows=[1]) df_dips = df_dips.merge(df_mapping, left_on='Site', right_on='SHIP TO NAME',how='left') df_dips = df_dips.merge(df_sales, left_on ='IDASS ID', right_on=

How to get minimum of each group for each day based on hour criteria

那年仲夏 提交于 2019-11-28 10:53:52
问题 I have given two dataframes below for you to test df = pd.DataFrame({ 'subject_id':[1,1,1,1,1,1,1,1,1,1,1], 'time_1' :['2173-04-03 12:35:00','2173-04-03 17:00:00','2173-04-03 20:00:00','2173-04-04 11:00:00','2173-04-04 11:30:00','2173-04-04 12:00:00','2173-04-05 16:00:00','2173-04-05 22:00:00','2173-04-06 04:00:00','2173-04-06 04:30:00','2173-04-06 06:30:00'], 'val' :[5,5,5,10,5,10,5,8,3,8,10] }) df1 = pd.DataFrame({ 'subject_id':[1,1,1,1,1,1,1,1,1,1,1], 'time_1' :['2173-04-03 12:35:00','2173