dataframe

Compare two dataframes in R

不问归期 提交于 2021-01-28 10:54:29
问题 I have two dataframes in R and want to compare any entries of rows. I want two check if the value of the first entrie, second entrie etc. of first (any) row of the first dataframe is bigger as the entrie of the first entrie of the the first row of the second dataframe. Afterwards it should give me a TRUE if all entries are bigger and in the intervall (0,2). It looks like this. Dataframe 1 Letter 2011 2012 2013 A 2 3 5 B 6 6 6 C 5 4 8 Dataframe 2 Letter 2011 2012 2013 A 1 1 4 C 5 5 5 Result

Rename column with same column name based on values in DataFrame

本秂侑毒 提交于 2021-01-28 10:53:38
问题 I have a DataFrame which can contain columns with the same column name. Based on the value I want to rename the column name so there are no duplicates. I've tried a few things, but every time I try to iterate over the columns and rename them I end up with the column name. df.rename(columns=df.columns[i]: 'some_name'}) seems to use the column name as well. Let's say I have a dataframe; df = pd.DataFrame({"A": [10kg], "B": [4], "A": [4%]}) I would like to rename the column(s) named "A" based on

Rename column with same column name based on values in DataFrame

*爱你&永不变心* 提交于 2021-01-28 10:50:07
问题 I have a DataFrame which can contain columns with the same column name. Based on the value I want to rename the column name so there are no duplicates. I've tried a few things, but every time I try to iterate over the columns and rename them I end up with the column name. df.rename(columns=df.columns[i]: 'some_name'}) seems to use the column name as well. Let's say I have a dataframe; df = pd.DataFrame({"A": [10kg], "B": [4], "A": [4%]}) I would like to rename the column(s) named "A" based on

Reshape a dataset with Start and End Dates to create a Time Series counting aggregate sum by day/month/quarter

╄→尐↘猪︶ㄣ 提交于 2021-01-28 10:48:29
问题 I have a dataset exactly like this: ProjectID Start End Type Project 1 01/01/2019 27/04/2019 HR Project 2 15/01/2019 11/11/2019 Marketing Project 3 25/02/2019 30/07/2019 Finance Project 4 22/02/2019 15/04/2019 HR Project 5 05/03/2019 29/09/2019 HR Project 6 11/04/2019 01/12/2019 Marketing Project 7 29/07/2019 23/08/2019 Finance Project 8 25/08/2019 23/12/2019 Operations Project 9 31/10/2019 29/11/2019 Operations Project 10 10/12/2019 25/12/2019 Operations I want to know over time, how many

Convert pandas dataframe column of UTC time string to floats

瘦欲@ 提交于 2021-01-28 10:38:11
问题 I have a pandas dataframe with a column of strings, with datetimes in UTC format, but need to convert them to floats. I'm having trouble doing this. Here is a view of my column: df['time'][0:3] 0 2018-04-18T19:00:00.000000000Z 1 2018-04-18T19:15:00.000000000Z 2 2018-04-18T19:30:00.000000000Z Name: time, dtype: object I've been trying this, but isn't working for me: import datetime for i in range(1,len(df)): df['time'][i] = datetime.datetime.strptime(df['time'][i], '%Y-%m-%dT%H:%M:%S.%f000Z')

How to shift entire groups in pandas groupby

时光怂恿深爱的人放手 提交于 2021-01-28 10:32:30
问题 Given the following data: data = {'a' : [1,1,1,8,8,3,3,3,3,4,4] } df = pd.DataFrame(data) I would now like to shift the whole thing down by n groups , so that their current order is preserved. The desired output for a shift of n=1 would be: desired_output = {'a': [NaN,NaN,NaN,1,1,8,8,8,8,3,3] } desired_output_df = pd.DataFrame(desired_output) a shift of n=2 should be: desired_output = {'a': [NaN,NaN,NaN,NaN,NaN,1,1,1,1,8,8] } desired_output_df = pd.DataFrame(desired_output) I have been

How to shift entire groups in pandas groupby

試著忘記壹切 提交于 2021-01-28 10:22:39
问题 Given the following data: data = {'a' : [1,1,1,8,8,3,3,3,3,4,4] } df = pd.DataFrame(data) I would now like to shift the whole thing down by n groups , so that their current order is preserved. The desired output for a shift of n=1 would be: desired_output = {'a': [NaN,NaN,NaN,1,1,8,8,8,8,3,3] } desired_output_df = pd.DataFrame(desired_output) a shift of n=2 should be: desired_output = {'a': [NaN,NaN,NaN,NaN,NaN,1,1,1,1,8,8] } desired_output_df = pd.DataFrame(desired_output) I have been

Add geom_line to stacked barplot in r

蓝咒 提交于 2021-01-28 09:01:11
问题 I have looked a similar threads but haven't seen anything specific to my situation. I want to add a geom_line to a fill barchart in ggplot2. I have the values I want to superimpose as a vector. Is there a simple way to do this without merging all the values into the same dataframe? my code if relevant: ggplot(df_region, aes(fill=as.factor(Secondary1), y=Total, x=Year)) + geom_bar(position="fill", stat="identity") + theme(legend.position="bottom") + theme(legend.title=element_blank()) + labs(y

How to write multiple pandas data frames to single output excel file? [duplicate]

a 夏天 提交于 2021-01-28 08:17:56
问题 This question already has answers here : Save list of DataFrames to multisheet Excel spreadsheet (3 answers) Closed 3 years ago . I am currently working on Python3 and have a large number of related pandas dataframes. I need to write it to single excel file with each dataframe as a separate tab, I can write all of them separately to excel files and then copy, paste them to a single file. However, I am looking for a more automated solution to the problem. Thanks in advance for the help. 回答1:

Normalize a column of dataframe using min max normalization based on groupby of another column

非 Y 不嫁゛ 提交于 2021-01-28 08:09:44
问题 The dataframe is as shown Name Job Salary john painter 40000 peter engineer 50000 sam plumber 30000 john doctor 500000 john driver 20000 sam carpenter 10000 peter scientist 100000 How can i groupby the column Name and apply normalization for the Salary column on each group? Expected result: Name Job Salary john painter 0.041666 peter engineer 0 sam plumber 1 john doctor 1 john driver 0 sam carpenter 0 peter scientist 1 I have tried the following data = df.groupby('Name').transform(lambda x: