pandas-groupby | 易学教程

Deleting rows based on values in other rows

阅读更多关于 Deleting rows based on values in other rows

问题 I was looking for a way to drop rows from my dataframe based on conditions to be checked with values in another row. Here is my dataframe: product product_id account_status prod-A 100 active prod-A 100 cancelled prod-A 300 active prod-A 400 cancelled If a row with account_status='active' exists for a product & and product_id combination, then retain this row and delete other rows. The desired output is: product product_id account_status prod-A 100 active prod-A 300 active prod-A 400 cancelled

Pandas groupby and sum total of group

阅读更多关于 Pandas groupby and sum total of group

I have a Pandas DataFrame with customer refund reasons. It contains these example data rows: **case_type** **claim_type** 1 service service 2 service service 3 chargeback service 4 chargeback local_charges 5 service supplier_service 6 chargeback service 7 chargeback service 8 chargeback service 9 chargeback service 10 chargeback service 11 service service_not_used 12 service service_not_used I would like to compare the customer's reason with some sort of labeled reason. This is no problem, but I would also like to see the total number of records in a specific group (customer reason). case

Concat python dataframes based on unique rows

阅读更多关于 Concat python dataframes based on unique rows

问题 My dataframe reads like : df1 user_id username firstname lastname 123 abc abc abc 456 def def def 789 ghi ghi ghi df2 user_id username firstname lastname 111 xyz xyz xyz 456 def def def 234 mnp mnp mnp Now I want a output dataframe like user_id username firstname lastname 123 abc abc abc 456 def def def 789 ghi ghi ghi 111 xyz xyz xyz 234 mnp mnp mnp As user_id 456 is common across both the dataframes. I have tried groupby on user_id groupby(['user_id']) . But looks like groupby need to be

Sliding window iterator using rolling in pandas

阅读更多关于 Sliding window iterator using rolling in pandas

If it's single row, I can get the iterator as following import pandas as pd import numpy as np a = np.zeros((100,40)) X = pd.DataFrame(a) for index, row in X.iterrows(): print index print row Now I want each iterator will return a subset X[0:9, :] , X[5:14, :] , X[10:19, :] etc. How do I achieve this with rolling ( pandas.DataFrame.rolling )? I'll experiment with the following dataframe. Setup import pandas as pd import numpy as np from string import uppercase def generic_portfolio_df(start, end, freq, num_port, num_sec, seed=314): np.random.seed(seed) portfolios = pd.Index(['Portfolio {}'

Adding a grouped, aggregate nunique column to pandas dataframe

阅读更多关于 Adding a grouped, aggregate nunique column to pandas dataframe

I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc. my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column. something like this isn't working: df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique() nor is df['n_unique_id'] = df.groupby(['track', 'type'])['id']

Pandas group by time with specified start time with non integer minutes

阅读更多关于 Pandas group by time with specified start time with non integer minutes

问题 I have a dataframe with one hour long signals. I want to group them in 10 minutes buckets. The problem is that the starting time is not precisely a "multiple" of 10 minutes, therefore, instead of obtaining 6 groups, I obtain 7 with the first and the last incomplete. The issue can be easily reproduced doing import pandas as pd import numpy as np import datetime as dt rng = pd.date_range('1/1/2011 00:05:30', periods=3600, freq='1S') ts = pd.DataFrame({'a':np.random.randn(len(rng)),'b':np.random

Grouping DataFrame by start of decade using pandas Grouper

阅读更多关于 Grouping DataFrame by start of decade using pandas Grouper

问题 I have a dataframe of daily observations from 01-01-1973 to 12-31-2014. Have been using Pandas Grouper and everything has worked fine for each frequency until now: I want to group them by decade 70s, 80s, 90s, etc. I tried to do it as import pandas as pd df.groupby(pd.Grouper(freq = '10Y')).mean() However, this groups them in 73-83, 83-93, etc. 回答1: You can do a little arithmetic on the year to floor it to the nearest decade: df.groupby(df.index.year // 10 * 10).mean() 回答2: pd.cut also works

pandas groupby dropping columns

阅读更多关于 pandas groupby dropping columns

I'm doing a simple group by operation, trying to compare group means. As you can see below, I have selected specific columns from a larger dataframe, from which all missing values have been removed. But when I group by, I am losing a couple of columns: I have never encountered this with pandas, and I'm not finding anything else on stack overflow that is all that similar. Does anybody have any insight? I think it is Automatic exclusion of 'nuisance' columns , what described here . Sample: df = pd.DataFrame({'C': {0: -0.91985400000000006, 1: -0.042379, 2: 1.2476419999999999, 3: -0.00992, 4: 0

Pandas : Sum multiple columns and get results in multiple columns

阅读更多关于 Pandas : Sum multiple columns and get results in multiple columns

I have a "sample.txt" like this. idx A B C D cat J 1 2 3 1 x K 4 5 6 2 x L 7 8 9 3 y M 1 2 3 4 y N 4 5 6 5 z O 7 8 9 6 z With this dataset, I want to get sum in row and column. In row, it is not a big deal. I made result like this. ### MY CODE ### import pandas as pd df = pd.read_csv('sample.txt',sep="\t",index_col='idx') df.info() df2 = df.groupby('cat').sum() print( df2 ) The result is like this. A B C D cat x 5 7 9 3 y 8 10 12 7 z 11 13 15 11 But I don't know how to write a code to get result like this. (simply add values in column A and B as well as column C and D) AB CD J 3 4 K 9 8 L 15

Pandas: plot multiple time series DataFrame into a single plot

阅读更多关于 Pandas: plot multiple time series DataFrame into a single plot

问题 I have the following pandas DataFrame: time Group blocks 0 1 A 4 1 2 A 7 2 3 A 12 3 4 A 17 4 5 A 21 5 6 A 26 6 7 A 33 7 8 A 39 8 9 A 48 9 10 A 59 .... .... .... 36 35 A 231 37 1 B 1 38 2 B 1.5 39 3 B 3 40 4 B 5 41 5 B 6 .... .... .... 911 35 Z 349 This is a dataframe with multiple time series-ques data, from min=1 to max=35 . Each Group has a time series like this. I would like to plot each individual time series A through Z against an x-axis of 1 to 35. The y-axis would be the blocks at each