pandas-groupby

Replace column values within a groupby and condition

瘦欲@ 提交于 2019-12-25 00:46:22
问题 I have a dataframe that I want to find the minimum value of a column within a group, and then based on that row, update the values of some of the other columns. The following code does what I want: import pandas as pd df = pd.DataFrame({'ID': [1,1,1,2,2,2,], 'Albedo': [0.2, 0.4, 0.5, 0.3, 0.5, 0.1], 'Temp' : [20, 30, 15, 40, 10, 5], 'Precip': [200, 100, 150, 60, 110, 45], 'Year': [1950, 2000, 2004, 1999, 1976, 1916]}) #cols to replace values for cols = ['Temp', 'Precip', 'Year'] final = pd

After using Groupby or pivot count function in pandas how to apply some analysis and geting original data

房东的猫 提交于 2019-12-24 21:37:01
问题 I have a dataset for 15000 Villages,For 1 district,there are 12 blocks/Taluka, In that district there are several crops grown,I have to check that, crop wise sown area for that villages, and select 10 villages for each crop in a random sampling basis , My first step is to remove 0 sown area villages in a data set, after removing 0 sown area I get 6674 villages, next I am check that, in a district,In a block/Taluka how many villages are remaining,so I am use pivot and group by function for

Iterate over a subset of a Pandas groupby object

ぐ巨炮叔叔 提交于 2019-12-24 18:41:59
问题 I have a Pandas groupby object, and I would like to iterate over the first n groups. I've tried: import pandas as pd df = pd.DataFrame({'A':['a','a','a','b','b','c','c','c','c','d','d'], 'B':[1,2,3,4,5,6,7,8,9,10,11]}) df_grouped = df.groupby('A') i = 0 n = 2 # for instance for name, group in df_grouped: #DO SOMETHING if i == n: break i += 1 and group_list = list(df_grouped.groups.keys())[:n] for name in group_list: group = df_grouped.get_group(name) #DO SOMETHING but I wondered if there was

Pandas Groupby and apply method with custom function

血红的双手。 提交于 2019-12-24 12:15:02
问题 I built the following function with the aim of estimating an optimal exponential moving average of a pandas' DataFrame column. from scipy import optimize from sklearn.metrics import mean_squared_error import pandas as pd ## Function that finds best alpha and uses it to create ewma def find_best_ewma(series, eps=10e-5): def f(alpha): ewm = series.shift().ewm(alpha=alpha, adjust=False).mean() return mean_squared_error(series, ewm.fillna(0)) result = optimize.minimize(f,.3, bounds=[(0+eps, 1-eps

Python lambda function syntax to transform a pandas groupby dataframe

不想你离开。 提交于 2019-12-24 11:44:55
问题 This should be a very simple question to answer. I have two lines of code. The first one works. The second gives the following error: SyntaxError: invalid syntax Here are the two lines of code. The first line (which works fine) counts the rows where off0_on1 == 1. The second one trys to count the rows where off0_on1 == 0. a1['on1'] = a1.groupby('del_month')['off0_on1'].transform(sum) a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(lambda x: 1 if x == 0) Here is the pandas dataframe

How to vectorize code with nested if and loops in Python?

空扰寡人 提交于 2019-12-24 11:20:02
问题 I have a dataframe like given below df = pd.DataFrame({ 'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2], 'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], 'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15] }) df['fake_flag'] = '' In this operation, I am performing an operation as shown below in code. This code works

Pandas groupby: group by semester

倾然丶 夕夏残阳落幕 提交于 2019-12-24 10:38:29
问题 I need to group data by semesters but there is no frequency tag available here 2QS (2 quarters from start) and 6MS (6 months from start) won't do because they will start in different moments, according to the first datetime in my dataframe. (Quite counterintuitive and prone to errors, IMHO: I didn't see this issue till I used a different dataset that began in May instead of January...) from datetime import * import pandas as pd import numpy as np df = pd.DataFrame() days = pd.date_range(start

Merge multiple dataframes using multiindex in python

我与影子孤独终老i 提交于 2019-12-24 10:23:46
问题 I have 3 series which is generated out of the code shown below. I have shown a the code for one series below I would like to merge 3 such series/dataframes using columns (subject_id,hadm_id,icustay_id) but unfortunately these headings don't appear as column names. How do I convert them as columns and use them for merging with another series/dataframe of similar datatype I am generating series from another dataframe (df) based on the condition given below. Though I already tried converting

Pandas dataframe. Group by value and count

孤者浪人 提交于 2019-12-24 09:58:22
问题 I have the following table: Days, Age, Sex 5, 39, F 4, 54, M 4, 26, M 5, 42, M 4, 29, M I want to count number of rows with F and M separately. The following command works, but I'm not OK with the representation: df.groupby("Sex").count() What will be the best way to do it? Thank you. 回答1: Just to add to Wen's answer. Alternatively, you can use value_counts while selecting the column with df.Sex . df.Sex.value_counts() M 4 F 1 Name: Sex, dtype: int64 来源: https://stackoverflow.com/questions

Pandas add calculated column to groupby result

霸气de小男生 提交于 2019-12-24 09:49:06
问题 The below python script computes the following. A report of the total revenue from each customer A report for each customer showing how much of their spending went to each category. I want to compute the sales tax component for each of the reports. (All the items have a sales tax of 9.25%.) import pandas as pd from io import StringIO mystr = """Pedro|groceries|apple|1.42 Nitin|tobacco|cigarettes|15.00 Susie|groceries|cereal|5.50 Susie|groceries|milk|4.75 Susie|tobacco|cigarettes|15.00 Susie