pandas-groupby

df.groupby(…).agg(set) produces different result compared to df.groupby(…).agg(lambda x: set(x))

纵然是瞬间 提交于 2019-12-03 05:08:30
Answering this question it turned out that df.groupby(...).agg(set) and df.groupby(...).agg(lambda x: set(x)) are producing different results. Data: df = pd.DataFrame({ 'user_id': [1, 2, 3, 4, 1, 2, 3], 'class_type': ['Krav Maga', 'Yoga', 'Ju-jitsu', 'Krav Maga', 'Ju-jitsu','Krav Maga', 'Karate'], 'instructor': ['Bob', 'Alice','Bob', 'Alice','Alice', 'Alice','Bob']}) Demo: In [36]: df.groupby('user_id').agg(lambda x: set(x)) Out[36]: class_type instructor user_id 1 {Krav Maga, Ju-jitsu} {Alice, Bob} 2 {Yoga, Krav Maga} {Alice} 3 {Ju-jitsu, Karate} {Bob} 4 {Krav Maga} {Alice} In [37]: df

Looping through an Excel spreadsheet (using openpyxl)

大兔子大兔子 提交于 2019-12-02 23:37:59
问题 import openpyxl wb=openpyxl.load_workbook('Book_1.xlsx') ws=wb['Sheet_1'] I am trying to analyze an excel spreadsheet using openpyxl. My goal is to get the max number from column D for each group of numbers in column A. I would like help in getting a code to loop for the analysis. Here is an example of the spreadsheet that I am trying to analyze. The file name is Book 1 and the sheet name is Sheet 1. I am running Python 3.6.1, pandas 0.20.1, and openpyxl 2.4.7. I am providing the code I have

pandas: GroupBy .pipe() vs .apply()

百般思念 提交于 2019-12-02 20:52:55
In the example from the pandas documentation about the new .pipe() method for GroupBy objects, an .apply() method accepting the same lambda would return the same results. In [195]: import numpy as np In [196]: n = 1000 In [197]: df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n), .....: 'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3'], n), .....: 'Revenue': (np.random.random(n)*50+10).round(2), .....: 'Quantity': np.random.randint(1, 10, size=n)}) In [199]: (df.groupby(['Store', 'Product']) .....: .pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum()) .

Groupby filter, based on consecutive sequence sorted and ID and Date column

天涯浪子 提交于 2019-12-02 18:41:41
问题 I have a dataframe as shown below ID Status Date 0 1 F 2017-06-22 1 1 M 2017-07-22 2 1 P 2017-10-22 3 1 F 2018-06-22 4 1 P 2018-08-22 5 1 F 2018-10-22 6 1 F 2019-03-22 7 2 M 2017-06-29 8 2 F 2017-09-29 9 2 F 2018-01-29 10 2 M 2018-03-29 11 2 P 2018-08-29 12 2 M 2018-10-29 13 2 F 2018-12-29 14 3 M 2017-03-20 15 3 F 2018-06-20 16 3 P 2018-08-20 17 3 M 2018-10-20 18 3 F 2018-11-20 19 3 P 2018-12-20 20 3 F 2019-03-20 22 4 M 2017-08-10 23 4 F 2018-06-10 24 4 P 2018-08-10 25 4 F 2018-12-10 26 4 M

filter dataframe after groupby and nunique in pandas

亡梦爱人 提交于 2019-12-02 18:07:38
问题 i tried df.groupby("item")["variable"].nunique() and it returns a unique count of every item object. i want to filter to only return the count of "variable" > 3 conditional on Groupby item... is there a method? 回答1: When you want the groupby to be mapped to every row of the input, think about transform : df = df[df.groupby("item")["variable"].transform('nunique') > 3] 来源: https://stackoverflow.com/questions/53551777/filter-dataframe-after-groupby-and-nunique-in-pandas

How to reset cumsum after change in sign of values?

陌路散爱 提交于 2019-12-02 15:01:41
问题 In [46]: d = np.random.randn(10, 1) * 2 In [47]: df = pd.DataFrame(d.astype(int), columns=['data']) I am trying to create a cumsum column where it should reset after a sign change in data column, like this data custom_cumsum 0 -2 -2 1 -1 -3 2 1 1 3 -3 -3 4 -1 -4 5 2 2 6 0 2 7 3 5 8 -1 -1 9 -2 -3 I am able to achieve this with df.iterrows() . I am trying to avoid iterrows and do it with vector operations. There are couple of questions on resetting cumsum when there is NaN. I am not able to

Looping through an Excel spreadsheet (using openpyxl)

空扰寡人 提交于 2019-12-02 13:35:55
import openpyxl wb=openpyxl.load_workbook('Book_1.xlsx') ws=wb['Sheet_1'] I am trying to analyze an excel spreadsheet using openpyxl. My goal is to get the max number from column D for each group of numbers in column A. I would like help in getting a code to loop for the analysis. Here is an example of the spreadsheet that I am trying to analyze. The file name is Book 1 and the sheet name is Sheet 1. I am running Python 3.6.1, pandas 0.20.1, and openpyxl 2.4.7. I am providing the code I have so far. IIUC, use pandas module to achieve this: import pandas as pd df = pd.read_excel('yourfile.xlsx'

filter dataframe after groupby and nunique in pandas

落花浮王杯 提交于 2019-12-02 11:57:48
i tried df.groupby("item")["variable"].nunique() and it returns a unique count of every item object. i want to filter to only return the count of "variable" > 3 conditional on Groupby item... is there a method? When you want the groupby to be mapped to every row of the input, think about transform : df = df[df.groupby("item")["variable"].transform('nunique') > 3] 来源: https://stackoverflow.com/questions/53551777/filter-dataframe-after-groupby-and-nunique-in-pandas

Iterating over groups in a dataframe [duplicate]

試著忘記壹切 提交于 2019-12-02 11:54:18
This question already has an answer here: Looping over groups in a grouped dataframe 2 answers The issue I am having is that I want to group the dataframe and then use functions to manipulate the data after its been grouped. For example I want to group the data by Date and then iterate through each row in the date groups to parse to a function? The issue is groupby seems to create a tuple of the key and then a massive string consisting of all of the rows in the data making iterating through each row impossible When you apply groupby on a dataframe, you don't get rows, you get groups of

How to reset cumsum after change in sign of values?

孤街醉人 提交于 2019-12-02 11:22:38
In [46]: d = np.random.randn(10, 1) * 2 In [47]: df = pd.DataFrame(d.astype(int), columns=['data']) I am trying to create a cumsum column where it should reset after a sign change in data column, like this data custom_cumsum 0 -2 -2 1 -1 -3 2 1 1 3 -3 -3 4 -1 -4 5 2 2 6 0 2 7 3 5 8 -1 -1 9 -2 -3 I am able to achieve this with df.iterrows() . I am trying to avoid iterrows and do it with vector operations. There are couple of questions on resetting cumsum when there is NaN. I am not able to achieve this cumsum with those solutions. Create new key to groupby , then do cumsum within each group New