pandas-groupby | 易学教程

Number of unique pairs within one column - pandas

阅读更多关于 Number of unique pairs within one column - pandas

问题 I am having a little problem with producing statistics for my dataframe in pandas. My dataframe looks like this (I omit the index): id type 1 A 2 B 3 A 1 B 3 B 2 C 4 B 4 C What is important, each id has two type values assigned, as can be seen from the example above. I want to count all type combinations occurrences (so count number of unique id with given type combination), so I want to get such a dataframe: type count A, B 2 A, C 0 B, C 2 I tried using groupby in many ways, but in vain. I

Pandas groupby + resample first is really slow - since version 0.22

阅读更多关于 Pandas groupby + resample first is really slow - since version 0.22

问题 I have a piece of code that groups a dataframe and runs resample('1D').first() for each group. Since I upgraded to 0.22.0, it runs much slower. Setup code : import pandas as pd import numpy as np import datetime as dt import string # set up some data DATE_U = 50 STR_LEN = 10 STR_U = 50 N = 500 letters = list(string.ascii_lowercase) def get_rand_string(): return ''.join(np.random.choice(letters, size=STR_LEN)) dates = np.random.randint(0, 100000000, size=DATE_U) strings = [get_rand_string()

how to add new column based on the above row's value

阅读更多关于 how to add new column based on the above row's value

问题 I have one dataframe as below. At first,they have three columns('date','time','flag'). I want to add one column which based on the flag and date which means when I get flag=1 ,then the rest of this day the target is 1, otherwise the target is zero. date time flag target 0 2017/4/10 10:00:00 0 0 1 2017/4/10 11:00:00 1 1 2 2017/4/10 12:00:00 0 1 3 2017/4/10 13:00:00 0 1 4 2017/4/10 14:00:00 0 1 5 2017/4/11 10:00:00 1 1 6 2017/4/11 11:00:00 0 1 7 2017/4/11 12:00:00 1 1 8 2017/4/11 13:00:00 1 1 9

Pandas: for all set of duplicate entries in a particular column, grab some information

阅读更多关于 Pandas: for all set of duplicate entries in a particular column, grab some information

问题 I have a large Dataframe that looks similar to this: ID_Code Status1 Status2 0 A Done Not 1 A Done Done 2 B Not Not 3 B Not Done 4 C Not Not 5 C Not Not 6 C Done Done What I want to do is calculate is for each of the set of duplicate ID codes, find out the percentage of Not-Not entries are present. (i.e. [# of Not-Not/# of total entries] * 100) I'm struggling to do so using groupby and can't seem to get the right syntax to perform this. 回答1: I may have misunderstood the question, but you

Pandas group by weekday (M/T/W/T/F/S/S)

阅读更多关于 Pandas group by weekday (M/T/W/T/F/S/S)

问题 I have a pandas dataframe containing a time series (as index) of the form YYYY-MM-DD ('arrival_date') and I'd like to group by each of the weekdays (Monday to Sunday) in order to calculate for the other columns the mean, median, std etc. I should have in the end only seven rows and so far I've only found out how to group by week, which aggregates everything weekly. # Reading the data df_data = pd.read_csv('data.csv', delimiter=',') # Providing the correct format for the data df_data = pd.to

How to use pandas Grouper with 7d frequency and fill missing days with 0?

阅读更多关于 How to use pandas Grouper with 7d frequency and fill missing days with 0?

I have the following sample dataset df = pd.DataFrame({ 'names': ['joe', 'joe', 'joe'], 'dates': [dt.datetime(2019,6,1), dt.datetime(2019,6,5), dt.datetime(2019,7,1)], 'values': [5,2,13] }) and I want to group by names and by weeks or 7 days, which I can achieve with df_grouped = df.groupby(['names', pd.Grouper(key='dates', freq='7d')]).sum() values names dates joe 2019-06-01 7 2019-06-29 13 But what I would be looking for is something like this, with all the explicit dates values names dates joe 2019-06-01 7 2019-06-08 0 2019-06-15 0 2019-06-22 0 2019-06-29 13 And by doing df_grouped.index

Pandas: GroupBy to DataFrame

阅读更多关于 Pandas: GroupBy to DataFrame

There is a very popular S.O. question regarding groupby to dataframe see here . Unfortunately, I do not think this particular use case is the most useful. Suppose you have what could be a hierarchical dataset in a flattened form: e.g. key val 0 'a' 2 1 'a' 1 2 'b' 3 3 'b' 4 what I wish to do is convert that dataframe to this structure 'a' 'b' 0 2 3 1 1 4 I thought this would be as simple as pd.DataFrame(df.groupby('key').groups) but it is not. So how can I make this transformation? df.assign(index=df.groupby('key').cumcount()).pivot('index','key','val') Out[369]: key 'a' 'b' index 0 2 3 1 1 4

Group dataframe by multiple columns and append the result to the dataframe

阅读更多关于 Group dataframe by multiple columns and append the result to the dataframe

This is similar to Attach a calculated column to an existing dataframe , however, that solution doesn't work when grouping by more than one column in pandas v0.14. For example: $ df = pd.DataFrame([ [1, 1, 1], [1, 2, 1], [1, 2, 2], [1, 3, 1], [2, 1, 1]], columns=['id', 'country', 'source']) The following calculation works: $ df.groupby(['id','country'])['source'].apply(lambda x: x.unique().tolist()) 0 [1] 1 [1, 2] 2 [1, 2] 3 [1] 4 [1] Name: source, dtype: object But assigning the output to a new column result in an error: df['source_list'] = df.groupby(['id','country'])['source'].apply( lambda

Pandas Groupby TimeGrouper and apply

阅读更多关于 Pandas Groupby TimeGrouper and apply

As per this question. This groupby works when applied to my df for a pd.rolling_mean column as follows: data['maFast']=data['Last'].groupby(pd.TimeGrouper('d')) .apply(pd.rolling_mean,center=False,win‌dow=10) How do I apply the same groupby logic to another element of my df which contains pd.rolling_std and pd.rolling_mean : data['maSlow_std'] = pd.rolling_mean(data['Last'], window=60) + 2* pd.rolling_std(data['Last'], 20, min_periods=20) I think you need function lambda : data['maSlow_std']=data['Last'].groupby(pd.TimeGrouper('d')) .apply(lambda x: pd.rolling_mean(x, window=60) + 2* pd

Groupby sum and count on multiple columns in python

阅读更多关于 Groupby sum and count on multiple columns in python

问题 I have a pandas dataframe that looks like this ID country month revenue profit ebit 234 USA 201409 10 5 3 344 USA 201409 9 7 2 532 UK 201410 20 10 5 129 Canada 201411 15 10 5 I want to group by ID, country, month and count the IDs per month and country and sum the revenue, profit, ebit. The output for the above data would be: country month revenue profit ebit count USA 201409 19 12 5 2 UK 201409 20 10 5 1 Canada 201411 15 10 5 1 I have tried different variations of groupby, sum and count