pandas-groupby | 易学教程

Pandas groupby with delimiter join

阅读更多关于 Pandas groupby with delimiter join

问题 I tried to use groupby to group rows with multiple values. col val A Cat A Tiger B Ball B Bat import pandas as pd df = pd.read_csv("Inputfile.txt", sep='\t') group = df.groupby(['col'])['val'].sum() I got A CatTiger B BallBat I want to introduce a delimiter, so that my output looks like A Cat-Tiger B Ball-Bat I tried, group = df.groupby(['col'])['val'].sum().apply(lambda x: '-'.join(x)) this yielded, A C-a-t-T-i-g-e-r B B-a-l-l-B-a-t What is the issue here ? Thanks, AP 回答1: Alternatively you

Max and min from two series in pandas groupby

阅读更多关于 Max and min from two series in pandas groupby

问题 Is it possible to get the min and max values from two series in a groupby? For example in the following situation, when grouping by c , how can I get the min and max values for a and b at the same time? df = pd.DataFrame({'a': [10,20,3,40,55], 'b': [5,14,8,50,60], 'c': ['x','x','y','y','y']}) g = df.groupby(df.c) for key, item in g: print (g.get_group(key), "\n") a b c 0 10 5 x 1 20 14 x a b c 2 3 8 y 3 40 50 y 4 55 60 y I have resolved this by taking the min and max of each grouped series

converting pandas.core.groupby.SeriesGroupBy to dataframe

阅读更多关于 converting pandas.core.groupby.SeriesGroupBy to dataframe

问题 I had a dataframe and I applied the groupby method. Now I have a pandas.core.groupby.SeriesGroupBy but I cant use any of the dataframe methods onto it. How can I convert it to a usable dataframe? type(survivor) pandas.core.groupby.SeriesGroupBy by applying .groups it looks like this: {'C': Int64Index([ 1, 9, 19, 26, 30, 31, 34, 36, 39, 42, 847, 849, 852, 858, 859, 866, 874, 875, 879, 889], dtype='int64', name=u'ID', length=168), 'Q': Int64Index([ 5, 16, 22, 28, 32, 44, 46, 47, 82, 109, 116,

Looking at Previous Time series

阅读更多关于 Looking at Previous Time series

问题 I Have a dataset as shown below. The idea is looking at every previous 15minutes not the frequency which we use in grouper function. I want to see the number of positive changes in the previous 15 minutes. row Timestamp Direction Positive Neg Nut 1 1/20/19 12:15 2 1/20/19 12:17 Nut 3 1/20/19 12:17 Neg 4 1/20/19 12:18 Neg 5 1/20/19 12:19 Pos 6 1/20/19 12:20 Neg 7 1/20/19 12:21 Neg 8 1/20/19 12:22 Pos 9 1/20/19 12:23 Neg 10 1/20/19 12:24 Pos 11 1/20/19 12:25 Neg 12 1/20/19 12:26 Neg 13 1/20/19

Nesting a dictionary within another dictionary, grouping by values in a Pandas Dataframe

阅读更多关于 Nesting a dictionary within another dictionary, grouping by values in a Pandas Dataframe

问题 In this previous question: Nesting a counter within another dictionary where keys are dataframe columns , @Jezrael showed me how to nest a counter within another dictionary. My dataframe has another column which is effectively a superset of the ID, and is not named in a way which allows for the SuperID to be logically derived from an ID. SuperID ID Code E1 E1023 a E1 E1023 b E1 E1023 b E1 E1023 b E1 E1024 b E1 E1024 c E1 E1024 c E2 E1025 a E2 E1025 a E2 E1026 b Using the dictionary which was

Pandas ffill resampled data grouped by column

阅读更多关于 Pandas ffill resampled data grouped by column

问题 I'm trying to create a data frame from a start date and end date, for a number of asset_id's and turn it into a list of half-hours for each asset_id between the start and end date with the values of some_property filled forward. I've tried Grouped and resample from the documentation and examples from SO but am stumped how to get this done. Consider example: some_time = datetime(2018,4,2,20,20,42) start_date = datetime(some_time.year,some_time.month,some_time.day).astimezone(pytz.timezone(

grouping rows in list in pandas groupby

阅读更多关于 grouping rows in list in pandas groupby

问题 I have a pandas data frame like: a b A 1 A 2 B 5 B 5 B 4 C 6 I want to group by the first column and get second column as lists in rows: A [1,2] B [5,5,4] C [6] Is it possible to do something like this using pandas groupby? 回答1: You can do this using groupby to group on the column of interest and then apply list to every group: In [1]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]}) df Out[1]: a b 0 A 1 1 A 2 2 B 5 3 B 5 4 B 4 5 C 6 In [2]: df.groupby('a')['b'].apply

python pandas: assign control vs. treatment groupings randomly based on %

阅读更多关于 python pandas: assign control vs. treatment groupings randomly based on %

问题 I am working on an experiment design, where I need to split a dataframe df into a control and treatment group by % by pre-existing groupings. This is the dataframe df: df.head() customer_id | Group | many other columns ABC 1 CDE 1 BHF 2 NID 1 WKL 2 SDI 2 pd.pivot_table(df,index=['Group'],values=["customer_id"],aggfunc=lambda x: len(x.unique())) Group 1 : 55394 Group 2 : 34889 Now I need to add a column labeled "Flag" into the df. For Group 1, I want to randomly assign 50% "Control" and 50%

group values in intervals

阅读更多关于 group values in intervals

问题 I have a pandas series containing zeros and ones: df1 = pd.Series([ 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0]) df1 Out[3]: 0 0 1 0 2 0 3 0 4 0 5 1 6 1 7 1 8 0 9 0 10 0 I would like to create a dataframe df2 that contains the start and the end of intervals with the same value, together with the value associated... df2 in this case should be... df2 Out[5]: Start End Value 0 0 4 0 1 5 7 1 2 8 10 0 My attempt was: from operator import itemgetter from itertools import groupby a=[next(group) for key, group

Pandas: group by equal range

阅读更多关于 Pandas: group by equal range

问题 This is an example of my data frame: df_lst = [ {"wordcount": 100, "Stats": 198765, "id": 34}, {"wordcount": 99, "Stats": 98765, "id": 35}, {"wordcount": 200, "Stats": 18765, "id": 36}, {"wordcount": 250, "Stats": 788765, "id": 37}, {"wordcount": 345, "Stats": 12765, "id": 38}, {"wordcount": 456, "Stats": 238765, "id": 39}, {"wordcount": 478, "Stats": 1934, "id": 40}, {"wordcount": 890, "Stats": 19845, "id": 41}, {"wordcount": 812, "Stats": 1987, "id": 42}] df = pd.DataFrame(df_lst) df.set