pandas-groupby

Pandas enumerate groups in descending order

蹲街弑〆低调 提交于 2019-12-05 18:36:33
I've the following column: column 0 10 1 10 2 8 3 8 4 6 5 6 My goal is to find the today unique values (3 in this case) and create a new column which would create the following new_column 0 3 1 3 2 2 3 2 4 1 5 1 The numbering starts from length of unique values (3) and same number is repeated if current row is same as previous row based on original column. Number gets decreased as row value changes. All unique values in original column have same number of rows (2 rows for each unique value in this case). My solution was to groupby the original column and create a new list like below: i=1 new

Insert rows as a result of a groupby operation into the original dataframe

不羁岁月 提交于 2019-12-05 18:32:39
For example, I have a pandas dataframe as follows: col_1 col_2 col_3 col_4 a X 5 1 a Y 3 2 a Z 6 4 b X 7 8 b Y 4 3 b Z 6 5 And I want to, for each value in col_1, add the values in col_3 and col_4 (and many more columns) that correspond to X and Z from col_2 and create a new row with these values. So the output would be as below: col_1 col_2 col_3 col_4 a X 5 1 a Y 3 2 a Z 6 4 a NEW 11 5 b X 7 8 b Y 4 3 b Z 6 5 b NEW 13 13 Also, there could be more values in col_1 that will need the same treatment, so I can't explicitly reference 'a' and 'b'. I attempted to use a combination of groupby('col_1'

Pandas and groupby count the number of matches in two different columns

不打扰是莪最后的温柔 提交于 2019-12-05 18:19:33
I would like to count the number of matches after a groupby in a pandas dataframe. claim event material1 material2 A X M1 M2 A X M2 M3 A X M3 M0 A X M4 M4 A Y M5 M5 A Y M6 M0 B Z M7 M0 B Z M8 M0 First, I group by the pair claim event and for each of these groups I want to count the number of matches between the columns material1 and material 2 For the group by, I have grouped = df.groupby(['claim', 'event']) but then I don't know how to compare the two new columns. It should return the following dataframe : claim event matches A X 3 A Y 1 B Z 0 Do you have any idea how to do that ? Use isin

how to add new column based on the above row's value

我怕爱的太早我们不能终老 提交于 2019-12-05 16:07:59
I have one dataframe as below. At first,they have three columns('date','time','flag'). I want to add one column which based on the flag and date which means when I get flag=1 ,then the rest of this day the target is 1, otherwise the target is zero. date time flag target 0 2017/4/10 10:00:00 0 0 1 2017/4/10 11:00:00 1 1 2 2017/4/10 12:00:00 0 1 3 2017/4/10 13:00:00 0 1 4 2017/4/10 14:00:00 0 1 5 2017/4/11 10:00:00 1 1 6 2017/4/11 11:00:00 0 1 7 2017/4/11 12:00:00 1 1 8 2017/4/11 13:00:00 1 1 9 2017/4/11 14:00:00 0 1 10 2017/4/12 10:00:00 0 0 11 2017/4/12 11:00:00 0 0 12 2017/4/12 12:00:00 0 0

Getting max values from pandas multiindex dataframe

99封情书 提交于 2019-12-05 14:30:57
Im trying to retrieve only the max values (including the multi index values) from a pandas dataframe that has multiple indexes. The dataframe I have is generated via a groupby and column selection ('tOfmAJyI') like this: df.groupby('id')['tOfmAJyI'].value_counts() Out[4]: id tOfmAJyI 3 mlNXN 4 SSvEP 2 hCIpw 2 5 SSvEP 2 hCIpw 1 mlNXN 1 11 mlNXN 2 SSvEP 1 ... What I would like to achieve is to get the max values including their corresponding index values. So something like: id tOfmAJyI 3 mlNXN 4 5 SSvEP 2 11 mlNXN 2 ... Any ideas how I can achieve this? I was able to get the id and max value but

Pandas group by weekday (M/T/W/T/F/S/S)

与世无争的帅哥 提交于 2019-12-05 13:17:23
I have a pandas dataframe containing a time series (as index) of the form YYYY-MM-DD ('arrival_date') and I'd like to group by each of the weekdays (Monday to Sunday) in order to calculate for the other columns the mean, median, std etc. I should have in the end only seven rows and so far I've only found out how to group by week, which aggregates everything weekly. # Reading the data df_data = pd.read_csv('data.csv', delimiter=',') # Providing the correct format for the data df_data = pd.to_datetime(df_data['arrival_date'], format='%Y%m%d') # Converting the time series column to index df_data

Reshape pandas dataframe from rows to columns

杀马特。学长 韩版系。学妹 提交于 2019-12-05 03:32:48
I'm trying to reshape my data. At first glance, it sounds like a transpose, but it's not. I tried melts, stack/unstack, joins, etc. Use Case I want to have only one row per unique individual, and put all job history on the columns. For clients, it can be easier to read information across rows rather than reading through columns. Here's the data: import pandas as pd import numpy as np data1 = {'Name': ["Joe", "Joe", "Joe","Jane","Jane"], 'Job': ["Analyst","Manager","Director","Analyst","Manager"], 'Job Eff Date': ["1/1/2015","1/1/2016","7/1/2016","1/1/2015","1/1/2016"]} df2 = pd.DataFrame(data1

Python Pandas: Assign Last Value of DataFrame Group to All Entries of That Group

蹲街弑〆低调 提交于 2019-12-05 01:07:45
In Python Pandas, I have a DataFrame. I group this DataFrame by a column and want to assign the last value of a column to all rows of another column. I know that I am able to select the last row of the group by this command: import pandas as pd df = pd.DataFrame({'a': (1,1,2,3,3), 'b':(20,21,30,40,41)}) print(df) print("-") result = df.groupby('a').nth(-1) print(result) Result: a b 0 1 20 1 1 21 2 2 30 3 3 40 4 3 41 - b a 1 21 2 30 3 41 How would it be possible to assign the result of this operation back to the original dataframe so that I have something like: a b b_new 0 1 20 21 1 1 21 21 2 2

Time difference within group by objects in Python Pandas

强颜欢笑 提交于 2019-12-04 22:53:25
I have a dataframe that looks like this: from to datetime other ------------------------------------------------- 11 1 2016-11-06 22:00:00 - 11 1 2016-11-06 20:00:00 - 11 1 2016-11-06 15:45:00 - 11 12 2016-11-06 15:00:00 - 11 1 2016-11-06 12:00:00 - 11 18 2016-11-05 10:00:00 - 11 12 2016-11-05 10:00:00 - 12 1 2016-10-05 10:00:59 - 12 3 2016-09-06 10:00:34 - I want to groupby "from" and then "to" columns and then sort the "datetime" in descending order and then finally want to calculate the time difference within these grouped by objects between the current time and the next time. For eg, in

Pandas groupby/apply has different behaviour with int and string types

大憨熊 提交于 2019-12-04 17:04:43
I have the following dataframe X Y 0 A 10 1 A 9 2 A 8 3 A 5 4 B 100 5 B 90 6 B 80 7 B 50 and two different functions that are very similar def func1(x): if x.iloc[0]['X'] == 'A': x['D'] = 1 else: x['D'] = 0 return x[['X', 'D']] def func2(x): if x.iloc[0]['X'] == 'A': x['D'] = 'u' else: x['D'] = 'v' return x[['X', 'D']] Now I can groupby/apply these functions df.groupby('X').apply(func1) df.groupby('X').apply(func2) The first line gives me what I want, i.e. X D 0 A 1 1 A 1 2 A 1 3 A 1 4 B 0 5 B 0 6 B 0 7 B 0 But the second line returns something quite strange X D 0 A u 1 A u 2 A u 3 A u 4 A u 5