dataframe | 易学教程

Compare two dataframes in R

阅读更多关于 Compare two dataframes in R

问题 I have two dataframes in R and want to compare any entries of rows. I want two check if the value of the first entrie, second entrie etc. of first (any) row of the first dataframe is bigger as the entrie of the first entrie of the the first row of the second dataframe. Afterwards it should give me a TRUE if all entries are bigger and in the intervall (0,2). It looks like this. Dataframe 1 Letter 2011 2012 2013 A 2 3 5 B 6 6 6 C 5 4 8 Dataframe 2 Letter 2011 2012 2013 A 1 1 4 C 5 5 5 Result

Rename column with same column name based on values in DataFrame

阅读更多关于 Rename column with same column name based on values in DataFrame

问题 I have a DataFrame which can contain columns with the same column name. Based on the value I want to rename the column name so there are no duplicates. I've tried a few things, but every time I try to iterate over the columns and rename them I end up with the column name. df.rename(columns=df.columns[i]: 'some_name'}) seems to use the column name as well. Let's say I have a dataframe; df = pd.DataFrame({"A": [10kg], "B": [4], "A": [4%]}) I would like to rename the column(s) named "A" based on

Rename column with same column name based on values in DataFrame

阅读更多关于 Rename column with same column name based on values in DataFrame

Reshape a dataset with Start and End Dates to create a Time Series counting aggregate sum by day/month/quarter

阅读更多关于 Reshape a dataset with Start and End Dates to create a Time Series counting aggregate sum by day/month/quarter

问题 I have a dataset exactly like this: ProjectID Start End Type Project 1 01/01/2019 27/04/2019 HR Project 2 15/01/2019 11/11/2019 Marketing Project 3 25/02/2019 30/07/2019 Finance Project 4 22/02/2019 15/04/2019 HR Project 5 05/03/2019 29/09/2019 HR Project 6 11/04/2019 01/12/2019 Marketing Project 7 29/07/2019 23/08/2019 Finance Project 8 25/08/2019 23/12/2019 Operations Project 9 31/10/2019 29/11/2019 Operations Project 10 10/12/2019 25/12/2019 Operations I want to know over time, how many

Convert pandas dataframe column of UTC time string to floats

阅读更多关于 Convert pandas dataframe column of UTC time string to floats

问题 I have a pandas dataframe with a column of strings, with datetimes in UTC format, but need to convert them to floats. I'm having trouble doing this. Here is a view of my column: df['time'][0:3] 0 2018-04-18T19:00:00.000000000Z 1 2018-04-18T19:15:00.000000000Z 2 2018-04-18T19:30:00.000000000Z Name: time, dtype: object I've been trying this, but isn't working for me: import datetime for i in range(1,len(df)): df['time'][i] = datetime.datetime.strptime(df['time'][i], '%Y-%m-%dT%H:%M:%S.%f000Z')

How to shift entire groups in pandas groupby

阅读更多关于 How to shift entire groups in pandas groupby

问题 Given the following data: data = {'a' : [1,1,1,8,8,3,3,3,3,4,4] } df = pd.DataFrame(data) I would now like to shift the whole thing down by n groups , so that their current order is preserved. The desired output for a shift of n=1 would be: desired_output = {'a': [NaN,NaN,NaN,1,1,8,8,8,8,3,3] } desired_output_df = pd.DataFrame(desired_output) a shift of n=2 should be: desired_output = {'a': [NaN,NaN,NaN,NaN,NaN,1,1,1,1,8,8] } desired_output_df = pd.DataFrame(desired_output) I have been

How to shift entire groups in pandas groupby

阅读更多关于 How to shift entire groups in pandas groupby

Add geom_line to stacked barplot in r

阅读更多关于 Add geom_line to stacked barplot in r

问题 I have looked a similar threads but haven't seen anything specific to my situation. I want to add a geom_line to a fill barchart in ggplot2. I have the values I want to superimpose as a vector. Is there a simple way to do this without merging all the values into the same dataframe? my code if relevant: ggplot(df_region, aes(fill=as.factor(Secondary1), y=Total, x=Year)) + geom_bar(position="fill", stat="identity") + theme(legend.position="bottom") + theme(legend.title=element_blank()) + labs(y

How to write multiple pandas data frames to single output excel file? [duplicate]

阅读更多关于 How to write multiple pandas data frames to single output excel file? [duplicate]

问题 This question already has answers here : Save list of DataFrames to multisheet Excel spreadsheet (3 answers) Closed 3 years ago . I am currently working on Python3 and have a large number of related pandas dataframes. I need to write it to single excel file with each dataframe as a separate tab, I can write all of them separately to excel files and then copy, paste them to a single file. However, I am looking for a more automated solution to the problem. Thanks in advance for the help. 回答1:

Normalize a column of dataframe using min max normalization based on groupby of another column

阅读更多关于 Normalize a column of dataframe using min max normalization based on groupby of another column

问题 The dataframe is as shown Name Job Salary john painter 40000 peter engineer 50000 sam plumber 30000 john doctor 500000 john driver 20000 sam carpenter 10000 peter scientist 100000 How can i groupby the column Name and apply normalization for the Salary column on each group? Expected result: Name Job Salary john painter 0.041666 peter engineer 0 sam plumber 1 john doctor 1 john driver 0 sam carpenter 0 peter scientist 1 I have tried the following data = df.groupby('Name').transform(lambda x: