pandas-groupby

Pandas groupby with delimiter join

試著忘記壹切 提交于 2019-12-16 19:57:48
问题 I tried to use groupby to group rows with multiple values. col val A Cat A Tiger B Ball B Bat import pandas as pd df = pd.read_csv("Inputfile.txt", sep='\t') group = df.groupby(['col'])['val'].sum() I got A CatTiger B BallBat I want to introduce a delimiter, so that my output looks like A Cat-Tiger B Ball-Bat I tried, group = df.groupby(['col'])['val'].sum().apply(lambda x: '-'.join(x)) this yielded, A C-a-t-T-i-g-e-r B B-a-l-l-B-a-t What is the issue here ? Thanks, AP 回答1: Alternatively you

Max and min from two series in pandas groupby

我的梦境 提交于 2019-12-14 03:53:30
问题 Is it possible to get the min and max values from two series in a groupby? For example in the following situation, when grouping by c , how can I get the min and max values for a and b at the same time? df = pd.DataFrame({'a': [10,20,3,40,55], 'b': [5,14,8,50,60], 'c': ['x','x','y','y','y']}) g = df.groupby(df.c) for key, item in g: print (g.get_group(key), "\n") a b c 0 10 5 x 1 20 14 x a b c 2 3 8 y 3 40 50 y 4 55 60 y I have resolved this by taking the min and max of each grouped series

converting pandas.core.groupby.SeriesGroupBy to dataframe

佐手、 提交于 2019-12-14 02:25:22
问题 I had a dataframe and I applied the groupby method. Now I have a pandas.core.groupby.SeriesGroupBy but I cant use any of the dataframe methods onto it. How can I convert it to a usable dataframe? type(survivor) pandas.core.groupby.SeriesGroupBy by applying .groups it looks like this: {'C': Int64Index([ 1, 9, 19, 26, 30, 31, 34, 36, 39, 42, 847, 849, 852, 858, 859, 866, 874, 875, 879, 889], dtype='int64', name=u'ID', length=168), 'Q': Int64Index([ 5, 16, 22, 28, 32, 44, 46, 47, 82, 109, 116,

Looking at Previous Time series

烈酒焚心 提交于 2019-12-14 02:18:54
问题 I Have a dataset as shown below. The idea is looking at every previous 15minutes not the frequency which we use in grouper function. I want to see the number of positive changes in the previous 15 minutes. row Timestamp Direction Positive Neg Nut 1 1/20/19 12:15 2 1/20/19 12:17 Nut 3 1/20/19 12:17 Neg 4 1/20/19 12:18 Neg 5 1/20/19 12:19 Pos 6 1/20/19 12:20 Neg 7 1/20/19 12:21 Neg 8 1/20/19 12:22 Pos 9 1/20/19 12:23 Neg 10 1/20/19 12:24 Pos 11 1/20/19 12:25 Neg 12 1/20/19 12:26 Neg 13 1/20/19

Nesting a dictionary within another dictionary, grouping by values in a Pandas Dataframe

我的梦境 提交于 2019-12-14 02:09:52
问题 In this previous question: Nesting a counter within another dictionary where keys are dataframe columns , @Jezrael showed me how to nest a counter within another dictionary. My dataframe has another column which is effectively a superset of the ID, and is not named in a way which allows for the SuperID to be logically derived from an ID. SuperID ID Code E1 E1023 a E1 E1023 b E1 E1023 b E1 E1023 b E1 E1024 b E1 E1024 c E1 E1024 c E2 E1025 a E2 E1025 a E2 E1026 b Using the dictionary which was

Pandas ffill resampled data grouped by column

北战南征 提交于 2019-12-13 21:03:23
问题 I'm trying to create a data frame from a start date and end date, for a number of asset_id's and turn it into a list of half-hours for each asset_id between the start and end date with the values of some_property filled forward. I've tried Grouped and resample from the documentation and examples from SO but am stumped how to get this done. Consider example: some_time = datetime(2018,4,2,20,20,42) start_date = datetime(some_time.year,some_time.month,some_time.day).astimezone(pytz.timezone(

grouping rows in list in pandas groupby

China☆狼群 提交于 2019-12-13 20:20:02
问题 I have a pandas data frame like: a b A 1 A 2 B 5 B 5 B 4 C 6 I want to group by the first column and get second column as lists in rows: A [1,2] B [5,5,4] C [6] Is it possible to do something like this using pandas groupby? 回答1: You can do this using groupby to group on the column of interest and then apply list to every group: In [1]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]}) df Out[1]: a b 0 A 1 1 A 2 2 B 5 3 B 5 4 B 4 5 C 6 In [2]: df.groupby('a')['b'].apply

python pandas: assign control vs. treatment groupings randomly based on %

你。 提交于 2019-12-13 17:53:29
问题 I am working on an experiment design, where I need to split a dataframe df into a control and treatment group by % by pre-existing groupings. This is the dataframe df: df.head() customer_id | Group | many other columns ABC 1 CDE 1 BHF 2 NID 1 WKL 2 SDI 2 pd.pivot_table(df,index=['Group'],values=["customer_id"],aggfunc=lambda x: len(x.unique())) Group 1 : 55394 Group 2 : 34889 Now I need to add a column labeled "Flag" into the df. For Group 1, I want to randomly assign 50% "Control" and 50%

group values in intervals

只愿长相守 提交于 2019-12-13 15:53:44
问题 I have a pandas series containing zeros and ones: df1 = pd.Series([ 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0]) df1 Out[3]: 0 0 1 0 2 0 3 0 4 0 5 1 6 1 7 1 8 0 9 0 10 0 I would like to create a dataframe df2 that contains the start and the end of intervals with the same value, together with the value associated... df2 in this case should be... df2 Out[5]: Start End Value 0 0 4 0 1 5 7 1 2 8 10 0 My attempt was: from operator import itemgetter from itertools import groupby a=[next(group) for key, group

Pandas: group by equal range

浪尽此生 提交于 2019-12-13 15:26:56
问题 This is an example of my data frame: df_lst = [ {"wordcount": 100, "Stats": 198765, "id": 34}, {"wordcount": 99, "Stats": 98765, "id": 35}, {"wordcount": 200, "Stats": 18765, "id": 36}, {"wordcount": 250, "Stats": 788765, "id": 37}, {"wordcount": 345, "Stats": 12765, "id": 38}, {"wordcount": 456, "Stats": 238765, "id": 39}, {"wordcount": 478, "Stats": 1934, "id": 40}, {"wordcount": 890, "Stats": 19845, "id": 41}, {"wordcount": 812, "Stats": 1987, "id": 42}] df = pd.DataFrame(df_lst) df.set