pandas

Pivoting a Pandas Dataframe, no numeric types, index is not unique

≡放荡痞女 提交于 2021-02-09 06:59:06
问题 I am trying to convert some string data into columns, but have had a difficult time utilizing past responses because I do not have a unique index or multi-index that I could use. Sample format index location field value 1 location1 firstName A 2 location1 lastName B 3 location1 dob C 4 location1 email D 5 location1 title E 6 location1 address1 F 7 location1 address2 G 8 location1 address3 H 9 location1 firstName I 10 location1 lastName J 11 location1 dob K 12 location1 email L 13 location1

Sort by Frequency of Values in a Column - Pandas

对着背影说爱祢 提交于 2021-02-09 02:58:31
问题 I have a column in a dataframe Fruits Apple Mango Banana Apple Mango Banana Apple Mango Grapes I want to sort this column by Frequency of the values occurring in it, So the dataframe now should be: Fruits Apple Apple Apple Banana Banana Banana Mango Mango Grapes Thanks! 回答1: Create a freq column and then sort by freq and fruit name. df.assign(freq=df.apply(lambda x: df.Fruits.value_counts()\ .to_dict()[x.Fruits], axis=1))\ .sort_values(by=['freq','Fruits'],ascending=[False,True]).loc[:,[

Pandas: Approximate join on one column, exact match on other columns

帅比萌擦擦* 提交于 2021-02-09 02:46:53
问题 I have two pandas dataframes I want to join/merge exactly on a number of columns (say 3) and approximately, i.e nearest neighbour, on one (date) column. I also want to return the difference (days) between them. Each dataset is about 50,000 rows long. I'm most interested in an inner join, but the “leftovers” are also interesting if not too hard to get hold of. Most of the “exact match” observations will exist multiple times in each data frame. I've been trying to use difflib.get_close_matches

Remove zero from each column and rearranging it with python pandas/numpy

旧时模样 提交于 2021-02-08 23:42:34
问题 I am a total novice in python and currently I am stumbled with a simple but tricky situation. Is it possible to remove all these zeroes and rearrange the column from this : A B C D E F 10 10 5 0 0 0 0 0 0 13 3 4 0 13 41 55 0 0 0 0 31 30 21 0 11 19 20 0 0 0 To be something like this: A B C 10 10 5 13 3 4 13 41 55 31 30 21 11 19 20 回答1: Assuming all rows have the same amount of zeros: a = df.to_numpy() a = a[a!=0].reshape(-1,3) pd.DataFrame(a, columns=df.columns[:a.shape[1]]) A B C 0 10 10 5 1

Remove zero from each column and rearranging it with python pandas/numpy

╄→гoц情女王★ 提交于 2021-02-08 23:39:36
问题 I am a total novice in python and currently I am stumbled with a simple but tricky situation. Is it possible to remove all these zeroes and rearrange the column from this : A B C D E F 10 10 5 0 0 0 0 0 0 13 3 4 0 13 41 55 0 0 0 0 31 30 21 0 11 19 20 0 0 0 To be something like this: A B C 10 10 5 13 3 4 13 41 55 31 30 21 11 19 20 回答1: Assuming all rows have the same amount of zeros: a = df.to_numpy() a = a[a!=0].reshape(-1,3) pd.DataFrame(a, columns=df.columns[:a.shape[1]]) A B C 0 10 10 5 1

Remove zero from each column and rearranging it with python pandas/numpy

末鹿安然 提交于 2021-02-08 23:39:28
问题 I am a total novice in python and currently I am stumbled with a simple but tricky situation. Is it possible to remove all these zeroes and rearrange the column from this : A B C D E F 10 10 5 0 0 0 0 0 0 13 3 4 0 13 41 55 0 0 0 0 31 30 21 0 11 19 20 0 0 0 To be something like this: A B C 10 10 5 13 3 4 13 41 55 31 30 21 11 19 20 回答1: Assuming all rows have the same amount of zeros: a = df.to_numpy() a = a[a!=0].reshape(-1,3) pd.DataFrame(a, columns=df.columns[:a.shape[1]]) A B C 0 10 10 5 1

pandas: Select dataframe columns based on another dataframe's columns

蓝咒 提交于 2021-02-08 21:22:53
问题 I'm trying to subset a pandas dataframe based on columns in another, similar dataframe. I can do this easily in R: df1 <- data.frame(A=1:5, B=6:10, C=11:15) df2 <- data.frame(A=1:5, B=6:10) #Select columns in df1 that exist in df2 df1[df1 %in% df2] A B 1 1 6 2 2 7 3 3 8 4 4 9 5 5 10 #Select columns in df1 that do not exist in df2 df1[!(df1 %in% df2)] C 1 11 2 12 3 13 4 14 5 15 How can I do that with the pandas dataframes below? df1 = pd.DataFrame({'A': [1,2,3,4,5],'B': [6,7,8,9,10],'C': [11

Parse multiple date formats into a single format

我的未来我决定 提交于 2021-02-08 19:50:35
问题 I have one column called published (date). As you can see, it has multiple date formats and also nan values. I would like to skip nan values, convert all the other formats to %Y-%-%d, and ignore the one that has the only year. I tried df['publish_time']=pd.to_datetime(df['publish_time']) and also things like: fmt=['%Y-%m-%d', '%d-%m-%Y', '%d/%m/%Y', '%Y-%d-%m', '%Y-%d-%b', '%d-%b-%Y', '%d/%b/%Y','Year: %d; month','month: %d;Year','%Y','%b %d %Y','%b %Y %d'] but I could not solve it. Any

Python: How to find the nth weekday of the year?

醉酒当歌 提交于 2021-02-08 19:38:07
问题 I have seen a lot of similar posts on "nth weekday of the month", but my question pertains to "nth weekday of the year". Background: I have a table that has daily sales data. There are 3 columns: date, day of week (Mon, Tue, Wed etc.) and sales. I would like to match nth weekday of Year 1 with Year 2 and compare sales that way. Example1: 01/06/2020 matches with 01/04/2021, both are the 1st Monday of that year. Example2: 11/02/2019 matches with 10/31/2020, both are the 44th Saturday of that

Python: How to find the nth weekday of the year?

情到浓时终转凉″ 提交于 2021-02-08 19:36:35
问题 I have seen a lot of similar posts on "nth weekday of the month", but my question pertains to "nth weekday of the year". Background: I have a table that has daily sales data. There are 3 columns: date, day of week (Mon, Tue, Wed etc.) and sales. I would like to match nth weekday of Year 1 with Year 2 and compare sales that way. Example1: 01/06/2020 matches with 01/04/2021, both are the 1st Monday of that year. Example2: 11/02/2019 matches with 10/31/2020, both are the 44th Saturday of that