pandas

How do I group max and min timestamp on pandas dataframe

浪子不回头ぞ 提交于 2021-02-05 06:42:04
问题 I want to group a dataset and return the maximum and minimum timestamp. Here's my data id timestamp 1 2017-09-17 10:09:01 2 2017-10-02 01:13:15 1 2017-09-17 10:53:07 1 2017-09-17 10:52:18 2 2017-09-12 21:59:40 Here's the output that i want id max min 1 2017-09-17 10:53:07 2017-09-17 10:09:01 2 2017-10-02 01:13:15 2017-09-12 21:59:40 Here's what I did, the code seems not efficient, I hope theres better way to do this on pandas data1 = df.sort_values('timestamp').drop_duplicates(['customer_id']

python/pandas: using regular expressions remove anything in square brackets in string

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-05 06:41:48
问题 Working from a pandas dataframe trying to sanitize a column from something like $12,342 to 12342 and make the column into an int or float. Found one row though with 736[4] so I have to remove everything within the square brackets, brackets included. Code so far df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace('$','') df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace(',','') df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str

day of Year values starting from a particular date

删除回忆录丶 提交于 2021-02-05 06:40:55
问题 I have a dataframe with a date column. The duration is 365 days starting from 02/11/2017 and ending at 01/11/2018. Date 02/11/2017 03/11/2017 05/11/2017 . . 01/11/2018 I want to add an adjacent column called Day_Of_Year as follows: Date Day_Of_Year 02/11/2017 1 03/11/2017 2 05/11/2017 4 . . 01/11/2018 365 I apologize if it's a very basic question, but unfortunately I haven't been able to start with this. I could use datetime(), but that would return values such as 1 for 1st january, 2 for 2nd

day of Year values starting from a particular date

与世无争的帅哥 提交于 2021-02-05 06:40:54
问题 I have a dataframe with a date column. The duration is 365 days starting from 02/11/2017 and ending at 01/11/2018. Date 02/11/2017 03/11/2017 05/11/2017 . . 01/11/2018 I want to add an adjacent column called Day_Of_Year as follows: Date Day_Of_Year 02/11/2017 1 03/11/2017 2 05/11/2017 4 . . 01/11/2018 365 I apologize if it's a very basic question, but unfortunately I haven't been able to start with this. I could use datetime(), but that would return values such as 1 for 1st january, 2 for 2nd

merge pandas dataframes under new index level

馋奶兔 提交于 2021-02-05 06:40:30
问题 I have 2 pandas DataFrame s act and exp that I want to combine into a single dataframe df : import pandas as pd from numpy.random import rand act = pd.DataFrame(rand(3,2), columns=['a', 'b']) exp = pd.DataFrame(rand(3,2), columns=['a', 'c']) act #have a b 0 0.853910 0.405463 1 0.822641 0.255832 2 0.673718 0.313768 exp #have a c 0 0.464781 0.325553 1 0.565531 0.269678 2 0.363693 0.775927 Dataframe df should contain one more column index level than act and exp , and contain each under its own

Panadas Condition on Dataframe returns TypeError: '>' not supported between instances of 'str' and 'int'

旧时模样 提交于 2021-02-05 06:40:24
问题 I'm working on a DataFrame using pandas and I need to add a new column based on some conditions. My DataFrame is: discount tax total subtotal productid 3 0 20 13 002 10 3 106 94 003 46.49 6 21 20 004 I need to apply some conditions while adding a new column named as Class to the DataFrame. Conditions are as follows: IF discount > 20 & total > 100 & tax == 0 then Class should be 1 otherwise it should be 0 Here's how I have tried: def conditions(s): if (s['discount'] > 20) and (s['tax'] == 0)

merge pandas dataframes under new index level

谁说我不能喝 提交于 2021-02-05 06:40:08
问题 I have 2 pandas DataFrame s act and exp that I want to combine into a single dataframe df : import pandas as pd from numpy.random import rand act = pd.DataFrame(rand(3,2), columns=['a', 'b']) exp = pd.DataFrame(rand(3,2), columns=['a', 'c']) act #have a b 0 0.853910 0.405463 1 0.822641 0.255832 2 0.673718 0.313768 exp #have a c 0 0.464781 0.325553 1 0.565531 0.269678 2 0.363693 0.775927 Dataframe df should contain one more column index level than act and exp , and contain each under its own

how to get percentage for groupby size

与世无争的帅哥 提交于 2021-02-05 06:38:45
问题 I am looking for a way to get percentages df.groupby(['state', 'approved_or_not']).size() Output: school_state project_is_approved AK 0 55 1 290 AL 0 256 1 1506 AR 0 177 1 872 AZ 0 347 1 1800 which is good but what I want is percentages instead of counts. school_state project_is_approved AK 0 0.16 1 0.84 AL 0 0.14 1 0.86 I tried and couldn't figure out a way. Appreciate if someone can help? 回答1: Use SeriesGroupBy.value_counts with parameter normalize=True : df.groupby('state')['approved_or

Removing repetitive/duplicate occurance in excel using python

此生再无相见时 提交于 2021-02-05 06:35:08
问题 I am trying to remove the repetitive/duplicate Names which is coming under NAME column. I just want to keep the 1st occurrence from the repetitive/duplicate names by using python script. This is my input excel: And need output like this: 回答1: This isn't removing duplicates per say you're just filling duplicate keys in one column as blanks, I would handle this as follows : by creating a mask where you return a true/false boolean if the row is == the row above. assuming your dataframe is called

Take long list of items and reshape into dataframe “rows” - pandas python 3

故事扮演 提交于 2021-02-05 06:35:06
问题 I have a long list of items that I want to put in a data frame at set intervals. I have another list with "column names". E.g. colnames = ['Title', 'Date', 'Abstract', 'ID', 'Volume'] data = [a, b, c, d, e, f, g, h, i ,j, k, l, m, n, o] I want to create a data frame that looks like: | Title | Date | Abstract | ID | Volume __________________________________________________________________ 0 a b c d e 1 f g h i j 2 k l m n o Thanks for any suggestions! 回答1: You need DataFrame constructor with