pandas

How to sort data frame by column values?

天涯浪子 提交于 2021-02-07 19:48:15
问题 I am relatively new to python and pandas data frames so maybe I have missed something very easy here. So I was having data frame with many rows and columns but at the end finally manage to get only one row with maximum value from each column. I used this code to do that: import pandas as pd d = {'A' : [1.2, 2, 4, 6], 'B' : [2, 8, 10, 12], 'C' : [5, 3, 4, 5], 'D' : [3.5, 9, 1, 11], 'E' : [5, 8, 7.5, 3], 'F' : [8.8, 4, 3, 2]} df = pd.DataFrame(d, index=['a', 'b', 'c', 'd']) print df Out: A B C

Pandas: Sort innermost column group-wise based on other multilevel column

↘锁芯ラ 提交于 2021-02-07 19:46:31
问题 Consider below df: In [3771]: df = pd.DataFrame({'A': ['a'] * 11, 'B': ['b'] * 11, 'C': ['C1', 'C1', 'C2','C1', 'C3', 'C3', 'C2', 'C3', 'C3', 'C2', 'C2'], 'D': ['D1', 'D2', 'D1', 'D3', 'D3', 'D2', 'D4', 'D4', 'D1', 'D2', 'D3'], 'E': [{'value': '4', 'percentage': None}, {'value': 5, 'percentage': None}, {'value': 12, 'percentage': None}, {'value': 5, 'percentage': None}, {'value': '12', 'percentage': None}, {'value': 'N/A', 'percentage': None}, {}, {'value': 19, 'percentage': None}, {'value':

Pandas: Check if column exists in df from a list of columns

ⅰ亾dé卋堺 提交于 2021-02-07 19:30:18
问题 Goal here is to find the columns that does not exist in df and create them with null values. I have a list of column names like below: column_list = ('column_1', 'column_2', 'column_3') When I try to check if the column exists, it gives out True for only columns that exist and do not get False for those that are missing. for column in column_list: print df.columns.isin(column_list).any() In PySpark, I can achieve this using the below: for column in column_list: if not column in df.columns: df

Sort column in pandas, then sort another column while maintaining previous column sorted

六眼飞鱼酱① 提交于 2021-02-07 19:29:48
问题 So I have some data on lots of publicly traded stock. Each data row contains an id, a date, and some other information. Naturally, a stock might appear many times in the dataframe (i.e Google might have several entries that correspond to different dates at which the price was updated). I want to be able to sort the ids, then for each sorted block, sort the dates. NOTE: sorting is done in ascending order for the sake of the example. id date price 0 123 2015/01/13 x 1 114 2017/02/15 y 2 12 2016

Replace pandas zero value with ffill non-zero, if the subsequent value is non-zero

為{幸葍}努か 提交于 2021-02-07 19:29:48
问题 I need to replace "0" row data in pandas with the previous rows non-zero value IF and ONLY IF, the value in the row following the "0" is non zero. I.e. 101 92 78 0 107 0 0 would become: 101 92 78 78 107 0 0 Any ideas how to do this would be much appreciated :-) Thanks! 回答1: using shift you could do In [608]: df.loc[(df.val == 0) & (df.val.shift(-1) != 0), 'val'] = df.val.shift(1) In [609]: df Out[609]: val 0 101.0 1 92.0 2 78.0 3 78.0 4 107.0 5 0.0 6 0.0 回答2: This is answer is similar to

Sort column in pandas, then sort another column while maintaining previous column sorted

拜拜、爱过 提交于 2021-02-07 19:29:21
问题 So I have some data on lots of publicly traded stock. Each data row contains an id, a date, and some other information. Naturally, a stock might appear many times in the dataframe (i.e Google might have several entries that correspond to different dates at which the price was updated). I want to be able to sort the ids, then for each sorted block, sort the dates. NOTE: sorting is done in ascending order for the sake of the example. id date price 0 123 2015/01/13 x 1 114 2017/02/15 y 2 12 2016

Sort column in pandas, then sort another column while maintaining previous column sorted

99封情书 提交于 2021-02-07 19:29:06
问题 So I have some data on lots of publicly traded stock. Each data row contains an id, a date, and some other information. Naturally, a stock might appear many times in the dataframe (i.e Google might have several entries that correspond to different dates at which the price was updated). I want to be able to sort the ids, then for each sorted block, sort the dates. NOTE: sorting is done in ascending order for the sake of the example. id date price 0 123 2015/01/13 x 1 114 2017/02/15 y 2 12 2016

Pandas Groupby - naming aggregate output column

五迷三道 提交于 2021-02-07 19:20:38
问题 I have a pandas groupby command which looks like this: df.groupby(['year', 'month'], as_index=False).agg({'users':sum}) Is there a way I can name the agg output something other than 'users' during the groupby command? For example, what if I wanted the sum of users to be total_users? I could rename the column after the groupby is complete, but wonder if there is another way. 回答1: Per the docs: If a dict is passed, the keys will be used to name the columns. Otherwise the function’s name (stored

Ordering of rows in Pandas to_sql

余生长醉 提交于 2021-02-07 19:18:41
问题 I have a Pandas Dataframe which is ordered. a0 b0 c0 d0 370025442 370020440 370020436 \ 1 31/08/2014 First Yorkshire 53 05:10 0 0.8333 1.2167 2 31/08/2014 First Yorkshire 53 07:10 0 0.85 1.15 3 31/08/2014 First Yorkshire 53 07:40 0 0.5167 0.7833 4 31/08/2014 First Yorkshire 53 08:10 0 0.7 1 5 31/08/2014 First Yorkshire 53 08:40 NaN NaN NaN 6 31/08/2014 First Yorkshire 53 09:00 0 0.5 0.7667 7 31/08/2014 First Yorkshire 53 09:20 0 0.5833 1 8 31/08/2014 First Yorkshire 53 09:40 0 0.4 0.7 9 31/08

Ordering of rows in Pandas to_sql

二次信任 提交于 2021-02-07 19:18:28
问题 I have a Pandas Dataframe which is ordered. a0 b0 c0 d0 370025442 370020440 370020436 \ 1 31/08/2014 First Yorkshire 53 05:10 0 0.8333 1.2167 2 31/08/2014 First Yorkshire 53 07:10 0 0.85 1.15 3 31/08/2014 First Yorkshire 53 07:40 0 0.5167 0.7833 4 31/08/2014 First Yorkshire 53 08:10 0 0.7 1 5 31/08/2014 First Yorkshire 53 08:40 NaN NaN NaN 6 31/08/2014 First Yorkshire 53 09:00 0 0.5 0.7667 7 31/08/2014 First Yorkshire 53 09:20 0 0.5833 1 8 31/08/2014 First Yorkshire 53 09:40 0 0.4 0.7 9 31/08