pandas-groupby | 易学教程

How to groupby multiple columns and aggregate diff on different columns?

阅读更多关于 How to groupby multiple columns and aggregate diff on different columns?

问题 I am looking for help here on how to do this in Python / Panda: I am looking to take the original data (below) and find the daily difference of multiple cols (cnt_a and cnt_b) by a group with multiple cols (state, county and date). I've been trying it different ways, and I can't seem to get by the "check for duplicate" issue df.cnt_a = df.sort_values(['state','county','date']).groupby['state','county','date','cnt_a'].diff(-1) Tried splitting it out to fix one thing at a time: df1 = df.sort

How do I count number of occurrences per minute in a pandas data frame [duplicate]

阅读更多关于 How do I count number of occurrences per minute in a pandas data frame [duplicate]

问题 This question already has answers here : How to pivot a dataframe? (2 answers) Closed 1 year ago . I have a pandas data frame like this: timestamp status 2019-01-01 09:00:00 FAILED 2019-01-01 09:00:00 FAILED 2019-01-01 09:00:00 UNKNOWN 2019-01-01 09:00:00 PASSED 2019-01-01 09:00:00 PASSED 2019-01-01 09:01:00 PASSED 2019-01-01 09:01:00 FAILED How can I group the data per minute and count the number of each status per minute to get this data frame: timestamp PASSED FAILED UNKNOWN 2019-01-01 09

Pandas: how to collapse a Series' MultiIndex to a DateTimeIndex?

阅读更多关于 Pandas: how to collapse a Series' MultiIndex to a DateTimeIndex?

问题 As a followup of Pandas groupby: group by semester I need to collapse a Series' MultiIndex to a DateTimeIndex. I already gave a look at Collapse Pandas MultiIndex to Single Index but at no avail. I cannot make it work. Series ser is: dtime dtime 2016 1 78.0 7 79.0 2017 1 73.0 7 79.0 2018 1 79.0 7 71.0 Name: values, dtype: float64 How to collapse dtime to a single DateTimeIndex? dtime 2016-01-01 78.0 2016-07-01 79.0 2017-01-01 73.0 2017-07-01 79.0 2018-01-01 79.0 2018-07-01 71.0 Name: values,

How to groupby, and filter a dataframe based on the sum?

阅读更多关于 How to groupby, and filter a dataframe based on the sum?

问题 So I have a dataframe, milk_countries_exports, that consists of columns of: The 'Period', the year and month for a particular row (the dataset is month by month for a year) The 'Reporter' country, that is doing the exporting The 'Partner' countries that are importing from the 'reporter' The 'Commodity', which consists of 2 items, 'Milk and cream, neither concentrated nor sweetened', and 'Milk and cream, concentrated or sweetened' The 'Commodity Code', which is the number assigned to the item

Pandas: how to collapse a Series' MultiIndex to a DateTimeIndex?

阅读更多关于 Pandas: how to collapse a Series' MultiIndex to a DateTimeIndex?

Pandas Groupby: How to use two lambda functions?

阅读更多关于 Pandas Groupby: How to use two lambda functions?

问题 I can currently do the following in Pandas, but I get a stern finger wagging from FutureWarning: grpd = df.groupby("rank").agg({ "mean": np.mean, "meian": np.median, "min": np.min, "max": np.max, "25th percentile": lambda x: np.percentile(x, 25), "75th percentile": lambda x: np.percentile(x, 75) }) The following throws an error because I have two lambda functions: percentile_25 = lambda x: np.percentile(x, 25) percentile_75 = lambda x: np.percentile(x, 75) df = diffs[["User Installs", "rank"]

Counting the number of consecutive occurences of numbers in dataframe

阅读更多关于 Counting the number of consecutive occurences of numbers in dataframe

问题 I have a dataframe with a dummy column that contains 1s and 0s and I would like to count for each row how many times the 1s or 0s have occurred, starting at 0 every time, and counting up for 1s and counting down for 0s I have an example below: import pandas as pd df = pd.DataFrame({'Dummy': [0, 0, 1, 1, 1, 0, 1, 1, 1, 1], 'Counter': [-1, -2, 1, 2, 3, -1, 1, 2, 3, 4]}) 回答1: Let's try: blocks = df.Dummy.diff().ne(0).cumsum() counters = df.groupby(blocks).cumcount() + 1 df['Counter'] = np.where

how to aggregate only the numerical columns in a mixed dtypes dataframe

阅读更多关于 how to aggregate only the numerical columns in a mixed dtypes dataframe

问题 I have a mixed pd.DataFrame : import pandas as pd import numpy as np df = pd.DataFrame({ 'A' : 1., 'B' : pd.Timestamp('20130102'), 'C' : pd.Timestamp('20180101'), 'D' : np.random.rand(10), 'F' : 'foo' }) df Out[12]: A B C D F 0 1.0 2013-01-02 2018-01-01 0.592533 foo 1 1.0 2013-01-02 2018-01-01 0.819248 foo 2 1.0 2013-01-02 2018-01-01 0.298035 foo 3 1.0 2013-01-02 2018-01-01 0.330128 foo 4 1.0 2013-01-02 2018-01-01 0.371705 foo 5 1.0 2013-01-02 2018-01-01 0.541246 foo 6 1.0 2013-01-02 2018-01

Pandas: Group two columns based on value in another column

阅读更多关于 Pandas: Group two columns based on value in another column

问题 I'm pretty new to python/pandas and I have a dataframe that looks something like this: id name color id_1 alex blue id_2 james yellow id_1 sara black id_4 dave pink id_4 lin grey id_2 aly red I want to group by id and get the values in the other two columns as a list: id name color id_1 [alex,sara] [blue,black] id_2 [james,aly] [yellow,red] id_4 [dave,lin] [pink,grey] Is there an easy way to do that? 回答1: Use groupby and agg by custom function with tolist : df = df.groupby('id').agg(lambda x:

How to convert data.frame into matrix based on two columns in data.frame [duplicate]

阅读更多关于 How to convert data.frame into matrix based on two columns in data.frame [duplicate]

问题 This question already has an answer here : Pivot Tables or Group By for Pandas? (1 answer) Closed 1 year ago . I have a data frame looks like userId movieId rating 0 12882 1 4.0 1 12882 32 3.5 2 12882 47 5.0 3 12882 50 5.0 4 12882 110 4.5 But I want to convert it into a matrix which the rowname is userId, column name is movieId and the value is the rating. 1 32 47 12882 4.0 3.5 5.0 I have try to use the groupby, but after that, I have no idea how to convert it. test = Ratings[['userId',