pandas | 易学教程

Pivoting a Pandas Dataframe, no numeric types, index is not unique

阅读更多关于 Pivoting a Pandas Dataframe, no numeric types, index is not unique

问题 I am trying to convert some string data into columns, but have had a difficult time utilizing past responses because I do not have a unique index or multi-index that I could use. Sample format index location field value 1 location1 firstName A 2 location1 lastName B 3 location1 dob C 4 location1 email D 5 location1 title E 6 location1 address1 F 7 location1 address2 G 8 location1 address3 H 9 location1 firstName I 10 location1 lastName J 11 location1 dob K 12 location1 email L 13 location1

Sort by Frequency of Values in a Column - Pandas

阅读更多关于 Sort by Frequency of Values in a Column - Pandas

问题 I have a column in a dataframe Fruits Apple Mango Banana Apple Mango Banana Apple Mango Grapes I want to sort this column by Frequency of the values occurring in it, So the dataframe now should be: Fruits Apple Apple Apple Banana Banana Banana Mango Mango Grapes Thanks! 回答1: Create a freq column and then sort by freq and fruit name. df.assign(freq=df.apply(lambda x: df.Fruits.value_counts()\ .to_dict()[x.Fruits], axis=1))\ .sort_values(by=['freq','Fruits'],ascending=[False,True]).loc[:,[

Pandas: Approximate join on one column, exact match on other columns

阅读更多关于 Pandas: Approximate join on one column, exact match on other columns

问题 I have two pandas dataframes I want to join/merge exactly on a number of columns (say 3) and approximately, i.e nearest neighbour, on one (date) column. I also want to return the difference (days) between them. Each dataset is about 50,000 rows long. I'm most interested in an inner join, but the “leftovers” are also interesting if not too hard to get hold of. Most of the “exact match” observations will exist multiple times in each data frame. I've been trying to use difflib.get_close_matches

Remove zero from each column and rearranging it with python pandas/numpy

阅读更多关于 Remove zero from each column and rearranging it with python pandas/numpy

问题 I am a total novice in python and currently I am stumbled with a simple but tricky situation. Is it possible to remove all these zeroes and rearrange the column from this : A B C D E F 10 10 5 0 0 0 0 0 0 13 3 4 0 13 41 55 0 0 0 0 31 30 21 0 11 19 20 0 0 0 To be something like this: A B C 10 10 5 13 3 4 13 41 55 31 30 21 11 19 20 回答1: Assuming all rows have the same amount of zeros: a = df.to_numpy() a = a[a!=0].reshape(-1,3) pd.DataFrame(a, columns=df.columns[:a.shape[1]]) A B C 0 10 10 5 1

Remove zero from each column and rearranging it with python pandas/numpy

阅读更多关于 Remove zero from each column and rearranging it with python pandas/numpy

Remove zero from each column and rearranging it with python pandas/numpy

阅读更多关于 Remove zero from each column and rearranging it with python pandas/numpy

pandas: Select dataframe columns based on another dataframe's columns

阅读更多关于 pandas: Select dataframe columns based on another dataframe's columns

问题 I'm trying to subset a pandas dataframe based on columns in another, similar dataframe. I can do this easily in R: df1 <- data.frame(A=1:5, B=6:10, C=11:15) df2 <- data.frame(A=1:5, B=6:10) #Select columns in df1 that exist in df2 df1[df1 %in% df2] A B 1 1 6 2 2 7 3 3 8 4 4 9 5 5 10 #Select columns in df1 that do not exist in df2 df1[!(df1 %in% df2)] C 1 11 2 12 3 13 4 14 5 15 How can I do that with the pandas dataframes below? df1 = pd.DataFrame({'A': [1,2,3,4,5],'B': [6,7,8,9,10],'C': [11

Parse multiple date formats into a single format

阅读更多关于 Parse multiple date formats into a single format

问题 I have one column called published (date). As you can see, it has multiple date formats and also nan values. I would like to skip nan values, convert all the other formats to %Y-%-%d, and ignore the one that has the only year. I tried df['publish_time']=pd.to_datetime(df['publish_time']) and also things like: fmt=['%Y-%m-%d', '%d-%m-%Y', '%d/%m/%Y', '%Y-%d-%m', '%Y-%d-%b', '%d-%b-%Y', '%d/%b/%Y','Year: %d; month','month: %d;Year','%Y','%b %d %Y','%b %Y %d'] but I could not solve it. Any

Python: How to find the nth weekday of the year?

阅读更多关于 Python: How to find the nth weekday of the year?

问题 I have seen a lot of similar posts on "nth weekday of the month", but my question pertains to "nth weekday of the year". Background: I have a table that has daily sales data. There are 3 columns: date, day of week (Mon, Tue, Wed etc.) and sales. I would like to match nth weekday of Year 1 with Year 2 and compare sales that way. Example1: 01/06/2020 matches with 01/04/2021, both are the 1st Monday of that year. Example2: 11/02/2019 matches with 10/31/2020, both are the 44th Saturday of that

Python: How to find the nth weekday of the year?

阅读更多关于 Python: How to find the nth weekday of the year?