pandas-groupby | 易学教程

Dynamically merge lines that share the same key into one

阅读更多关于 Dynamically merge lines that share the same key into one

问题 I have a Dataframe and would like to make another column that combines the columns whose name begins with the same value in Answer and QID. That is to say, here is an exerpt of the dataframe: QID Category Text QType Question Answer0 Answer1 0 16 Automotive Access to car Single Do you have access to a car? I own a car/cars I own a car/cars 1 16 Automotive Access to car Single Do you have access to a car? I lease/ have a company car I lease/have a company car 2 16 Automotive Access to car

Dynamically merge lines that share the same key into one

阅读更多关于 Dynamically merge lines that share the same key into one

How to stack this specific row on pandas?

阅读更多关于 How to stack this specific row on pandas?

问题 Consider the below df df_dict = {'name': {0: ' john', 1: ' john', 4: ' daphne '}, 'address': {0: 'johns address', 1: 'johns address', 4: 'daphne address'}, 'phonenum1': {0: 7870395, 1: 7870450, 4: 7373209}, 'phonenum2': {0: None, 1: 123450 , 4: None}, 'phonenum3': {0: None, 1: 123456, 4: None} } df = pd.DataFrame(df_dict) name address phonenum1 phonenum2 phonenum3 0 john johns address 7870395 NaN NaN 1 john johns address 7870450 123450.0 123456.0 4 daphne daphne address 7373209 NaN NAN How to

plot a groupby object with bokeh

阅读更多关于 plot a groupby object with bokeh

问题 Consider the following MWE. from pandas import DataFrame from bokeh.plotting import figure data = dict(x = [0,1,2,0,1,2], y = [0,1,2,4,5,6], g = [1,1,1,2,2,2]) df = DataFrame(data) p = figure() p.line( 'x', 'y', source=df[ df.g == 1 ] ) p.line( 'x', 'y', source=df[ df.g == 2 ] ) Ideally, I would like to compress the last to lines in one: p.line( 'x', 'y', source=df.groupby('g') ) (Real life examples have a large and variable number of groups.) Is there any concise way to do this? 回答1: I just

plot a groupby object with bokeh

阅读更多关于 plot a groupby object with bokeh

get rows with largest value in grouping [duplicate]

阅读更多关于 get rows with largest value in grouping [duplicate]

问题 This question already has answers here : Get the Row(s) which have the max count in groups using groupby (11 answers) Closed 2 years ago . I have a dataframe that I group according to an id -column. For each group I want to get the row (the whole row, not just the value) containing the max value. I am able to do this by first getting the max value for each group, then create a filter array and then apply the filter on the original dataframe. Like so, import pandas as pd # Dummy data df = pd

Pandas: Use DataFrameGroupBy.filter() method to select DataFrame's rows with a value greater than the mean of the respective group

阅读更多关于 Pandas: Use DataFrameGroupBy.filter() method to select DataFrame's rows with a value greater than the mean of the respective group

问题 I am learning Python and Pandas and I am doing some exercises to understand how things work. My question is the following: can I use the GroupBy.filter() method to select the DataFrame's rows that have a value (in a specific column) greater than the mean of the respective group? For this exercise, I am using the "planets" dataset included in Seaborn: 1035 rows x 6 columns (column names: "method", "number", "orbital_period", "mass", "distance", "year"). In python: import pandas as pd import

Pandas: Use DataFrameGroupBy.filter() method to select DataFrame's rows with a value greater than the mean of the respective group

阅读更多关于 Pandas: Use DataFrameGroupBy.filter() method to select DataFrame's rows with a value greater than the mean of the respective group

Aggregate column values in pandas GroupBy as a dict

阅读更多关于 Aggregate column values in pandas GroupBy as a dict

问题 This is the question I had during the interview in the past. We have the input data having the following columns: language, product id, shelf id, rank For instance, the input would have the following format English, 742005, 4560, 10.2 English, 6000075389352, 4560, 49 French, 899883993, 4560, 32 French, 731317391, 7868, 81 we would like to do "group by" operation on language, shelf id columns and sort the list of products based on sort desc on "rank" attribute, which would result in the output

Exclude a specific date based on a condition using pandas

阅读更多关于 Exclude a specific date based on a condition using pandas

问题 df2 = pd.DataFrame({'person_id':[11,11,11,11,11,12,12,13,13,14,14,14,14], 'admit_date':['01/01/2011','01/01/2009','12/31/2013','12/31/2017','04/03/2014','08/04/2016', '03/05/2014','02/07/2011','08/08/2016','12/31/2017','05/01/2011','05/21/2014','07/12/2016']}) df2 = df2.melt('person_id', value_name='dates') df2['dates'] = pd.to_datetime(df2['dates']) What I would like to do is a) Exclude/filter out records from the data frame if a subject has Dec 31st and Jan 1st in its records. Please note