pandas-groupby

Dynamically merge lines that share the same key into one

╄→гoц情女王★ 提交于 2020-08-09 08:47:56
问题 I have a Dataframe and would like to make another column that combines the columns whose name begins with the same value in Answer and QID. That is to say, here is an exerpt of the dataframe: QID Category Text QType Question Answer0 Answer1 0 16 Automotive Access to car Single Do you have access to a car? I own a car/cars I own a car/cars 1 16 Automotive Access to car Single Do you have access to a car? I lease/ have a company car I lease/have a company car 2 16 Automotive Access to car

Dynamically merge lines that share the same key into one

元气小坏坏 提交于 2020-08-09 08:47:11
问题 I have a Dataframe and would like to make another column that combines the columns whose name begins with the same value in Answer and QID. That is to say, here is an exerpt of the dataframe: QID Category Text QType Question Answer0 Answer1 0 16 Automotive Access to car Single Do you have access to a car? I own a car/cars I own a car/cars 1 16 Automotive Access to car Single Do you have access to a car? I lease/ have a company car I lease/have a company car 2 16 Automotive Access to car

How to stack this specific row on pandas?

孤人 提交于 2020-08-06 04:15:50
问题 Consider the below df df_dict = {'name': {0: ' john', 1: ' john', 4: ' daphne '}, 'address': {0: 'johns address', 1: 'johns address', 4: 'daphne address'}, 'phonenum1': {0: 7870395, 1: 7870450, 4: 7373209}, 'phonenum2': {0: None, 1: 123450 , 4: None}, 'phonenum3': {0: None, 1: 123456, 4: None} } df = pd.DataFrame(df_dict) name address phonenum1 phonenum2 phonenum3 0 john johns address 7870395 NaN NaN 1 john johns address 7870450 123450.0 123456.0 4 daphne daphne address 7373209 NaN NAN How to

plot a groupby object with bokeh

泪湿孤枕 提交于 2020-08-03 10:21:51
问题 Consider the following MWE. from pandas import DataFrame from bokeh.plotting import figure data = dict(x = [0,1,2,0,1,2], y = [0,1,2,4,5,6], g = [1,1,1,2,2,2]) df = DataFrame(data) p = figure() p.line( 'x', 'y', source=df[ df.g == 1 ] ) p.line( 'x', 'y', source=df[ df.g == 2 ] ) Ideally, I would like to compress the last to lines in one: p.line( 'x', 'y', source=df.groupby('g') ) (Real life examples have a large and variable number of groups.) Is there any concise way to do this? 回答1: I just

plot a groupby object with bokeh

大憨熊 提交于 2020-08-03 10:20:02
问题 Consider the following MWE. from pandas import DataFrame from bokeh.plotting import figure data = dict(x = [0,1,2,0,1,2], y = [0,1,2,4,5,6], g = [1,1,1,2,2,2]) df = DataFrame(data) p = figure() p.line( 'x', 'y', source=df[ df.g == 1 ] ) p.line( 'x', 'y', source=df[ df.g == 2 ] ) Ideally, I would like to compress the last to lines in one: p.line( 'x', 'y', source=df.groupby('g') ) (Real life examples have a large and variable number of groups.) Is there any concise way to do this? 回答1: I just

get rows with largest value in grouping [duplicate]

穿精又带淫゛_ 提交于 2020-08-03 07:29:29
问题 This question already has answers here : Get the Row(s) which have the max count in groups using groupby (11 answers) Closed 2 years ago . I have a dataframe that I group according to an id -column. For each group I want to get the row (the whole row, not just the value) containing the max value. I am able to do this by first getting the max value for each group, then create a filter array and then apply the filter on the original dataframe. Like so, import pandas as pd # Dummy data df = pd

Pandas: Use DataFrameGroupBy.filter() method to select DataFrame's rows with a value greater than the mean of the respective group

感情迁移 提交于 2020-07-22 21:34:26
问题 I am learning Python and Pandas and I am doing some exercises to understand how things work. My question is the following: can I use the GroupBy.filter() method to select the DataFrame's rows that have a value (in a specific column) greater than the mean of the respective group? For this exercise, I am using the "planets" dataset included in Seaborn: 1035 rows x 6 columns (column names: "method", "number", "orbital_period", "mass", "distance", "year"). In python: import pandas as pd import

Pandas: Use DataFrameGroupBy.filter() method to select DataFrame's rows with a value greater than the mean of the respective group

混江龙づ霸主 提交于 2020-07-22 21:34:06
问题 I am learning Python and Pandas and I am doing some exercises to understand how things work. My question is the following: can I use the GroupBy.filter() method to select the DataFrame's rows that have a value (in a specific column) greater than the mean of the respective group? For this exercise, I am using the "planets" dataset included in Seaborn: 1035 rows x 6 columns (column names: "method", "number", "orbital_period", "mass", "distance", "year"). In python: import pandas as pd import

Aggregate column values in pandas GroupBy as a dict

自闭症网瘾萝莉.ら 提交于 2020-07-21 20:05:23
问题 This is the question I had during the interview in the past. We have the input data having the following columns: language, product id, shelf id, rank For instance, the input would have the following format English, 742005, 4560, 10.2 English, 6000075389352, 4560, 49 French, 899883993, 4560, 32 French, 731317391, 7868, 81 we would like to do "group by" operation on language, shelf id columns and sort the list of products based on sort desc on "rank" attribute, which would result in the output

Exclude a specific date based on a condition using pandas

纵饮孤独 提交于 2020-07-13 15:10:13
问题 df2 = pd.DataFrame({'person_id':[11,11,11,11,11,12,12,13,13,14,14,14,14], 'admit_date':['01/01/2011','01/01/2009','12/31/2013','12/31/2017','04/03/2014','08/04/2016', '03/05/2014','02/07/2011','08/08/2016','12/31/2017','05/01/2011','05/21/2014','07/12/2016']}) df2 = df2.melt('person_id', value_name='dates') df2['dates'] = pd.to_datetime(df2['dates']) What I would like to do is a) Exclude/filter out records from the data frame if a subject has Dec 31st and Jan 1st in its records. Please note