pandas-groupby | 易学教程

Replace column values within a groupby and condition

阅读更多关于 Replace column values within a groupby and condition

问题 I have a dataframe that I want to find the minimum value of a column within a group, and then based on that row, update the values of some of the other columns. The following code does what I want: import pandas as pd df = pd.DataFrame({'ID': [1,1,1,2,2,2,], 'Albedo': [0.2, 0.4, 0.5, 0.3, 0.5, 0.1], 'Temp' : [20, 30, 15, 40, 10, 5], 'Precip': [200, 100, 150, 60, 110, 45], 'Year': [1950, 2000, 2004, 1999, 1976, 1916]}) #cols to replace values for cols = ['Temp', 'Precip', 'Year'] final = pd

After using Groupby or pivot count function in pandas how to apply some analysis and geting original data

阅读更多关于 After using Groupby or pivot count function in pandas how to apply some analysis and geting original data

问题 I have a dataset for 15000 Villages,For 1 district,there are 12 blocks/Taluka, In that district there are several crops grown,I have to check that, crop wise sown area for that villages, and select 10 villages for each crop in a random sampling basis , My first step is to remove 0 sown area villages in a data set, after removing 0 sown area I get 6674 villages, next I am check that, in a district,In a block/Taluka how many villages are remaining,so I am use pivot and group by function for

Iterate over a subset of a Pandas groupby object

阅读更多关于 Iterate over a subset of a Pandas groupby object

问题 I have a Pandas groupby object, and I would like to iterate over the first n groups. I've tried: import pandas as pd df = pd.DataFrame({'A':['a','a','a','b','b','c','c','c','c','d','d'], 'B':[1,2,3,4,5,6,7,8,9,10,11]}) df_grouped = df.groupby('A') i = 0 n = 2 # for instance for name, group in df_grouped: #DO SOMETHING if i == n: break i += 1 and group_list = list(df_grouped.groups.keys())[:n] for name in group_list: group = df_grouped.get_group(name) #DO SOMETHING but I wondered if there was

Pandas Groupby and apply method with custom function

阅读更多关于 Pandas Groupby and apply method with custom function

问题 I built the following function with the aim of estimating an optimal exponential moving average of a pandas' DataFrame column. from scipy import optimize from sklearn.metrics import mean_squared_error import pandas as pd ## Function that finds best alpha and uses it to create ewma def find_best_ewma(series, eps=10e-5): def f(alpha): ewm = series.shift().ewm(alpha=alpha, adjust=False).mean() return mean_squared_error(series, ewm.fillna(0)) result = optimize.minimize(f,.3, bounds=[(0+eps, 1-eps

Python lambda function syntax to transform a pandas groupby dataframe

阅读更多关于 Python lambda function syntax to transform a pandas groupby dataframe

问题 This should be a very simple question to answer. I have two lines of code. The first one works. The second gives the following error: SyntaxError: invalid syntax Here are the two lines of code. The first line (which works fine) counts the rows where off0_on1 == 1. The second one trys to count the rows where off0_on1 == 0. a1['on1'] = a1.groupby('del_month')['off0_on1'].transform(sum) a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(lambda x: 1 if x == 0) Here is the pandas dataframe

How to vectorize code with nested if and loops in Python?

阅读更多关于 How to vectorize code with nested if and loops in Python?

问题 I have a dataframe like given below df = pd.DataFrame({ 'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2], 'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], 'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15] }) df['fake_flag'] = '' In this operation, I am performing an operation as shown below in code. This code works

Pandas groupby: group by semester

阅读更多关于 Pandas groupby: group by semester

问题 I need to group data by semesters but there is no frequency tag available here 2QS (2 quarters from start) and 6MS (6 months from start) won't do because they will start in different moments, according to the first datetime in my dataframe. (Quite counterintuitive and prone to errors, IMHO: I didn't see this issue till I used a different dataset that began in May instead of January...) from datetime import * import pandas as pd import numpy as np df = pd.DataFrame() days = pd.date_range(start

Merge multiple dataframes using multiindex in python

阅读更多关于 Merge multiple dataframes using multiindex in python

问题 I have 3 series which is generated out of the code shown below. I have shown a the code for one series below I would like to merge 3 such series/dataframes using columns (subject_id,hadm_id,icustay_id) but unfortunately these headings don't appear as column names. How do I convert them as columns and use them for merging with another series/dataframe of similar datatype I am generating series from another dataframe (df) based on the condition given below. Though I already tried converting

Pandas dataframe. Group by value and count

阅读更多关于 Pandas dataframe. Group by value and count

问题 I have the following table: Days, Age, Sex 5, 39, F 4, 54, M 4, 26, M 5, 42, M 4, 29, M I want to count number of rows with F and M separately. The following command works, but I'm not OK with the representation: df.groupby("Sex").count() What will be the best way to do it? Thank you. 回答1: Just to add to Wen's answer. Alternatively, you can use value_counts while selecting the column with df.Sex . df.Sex.value_counts() M 4 F 1 Name: Sex, dtype: int64 来源： https://stackoverflow.com/questions

Pandas add calculated column to groupby result

阅读更多关于 Pandas add calculated column to groupby result