问题
I have a file with two different dates: one has a timestamp and one does not. I need to read the file, disregard the timestamp, and compare the two dates. If the two dates are the same then I need to spit it to the output file and disregard any other rows. I'm having trouble knowing if I should be using a datetime function on the input and formatting the date there and then simply seeing if the two are equivalent? Or should I be using a timedelta?
I've tried a couple different ways but haven't had success.
df = pd.read_csv("File.csv", dtype={'DATETIMESTAMP': np.datetime64, 'DATE':np.datetime64})
Gives me : TypeError: the dtype < M8 is not supported for parsing, pass this column using parse_dates instead
I've also tried to just remove the timestamp and then compare, but the strings end up with different date formats and that doesn't work either.
df['RemoveTimestamp'] = df['DATETIMESTAMP'].apply(lambda x: x[:10])
df = df[df['RemoveTimestamp'] == df['DATE']]
Any guidance appreciated.
Here is my sample input CSV file:
"DATE", "DATETIMESTAMP"
"8/6/2014","2014-08-06T10:18:38.000Z"
"1/15/2013","2013-01-15T08:57:38.000Z"
"3/7/2013","2013-03-07T16:57:18.000Z"
"12/4/2012","2012-12-04T10:59:37.000Z"
"5/6/2014","2014-05-06T11:07:46.000Z"
"2/13/2013","2013-02-13T15:51:42.000Z"
回答1:
import pandas as pd
import numpy as np
# your data, both columns are in string
# ================================================
df = pd.read_csv('sample_data.csv')
df
DATE DATETIMESTAMP
0 8/6/2014 2014-08-06T10:18:38.000Z
1 1/15/2013 2013-01-15T08:57:38.000Z
2 3/7/2013 2013-03-07T16:57:18.000Z
3 12/4/2012 2012-12-04T10:59:37.000Z
4 5/6/2014 2014-05-06T11:07:46.000Z
5 2/13/2013 2013-02-13T15:51:42.000Z
# processing
# =================================================
# convert string to datetime
df['DATE'] = pd.to_datetime(df['DATE'])
df['DATETIMESTAMP'] = pd.to_datetime(df['DATETIMESTAMP'])
# cast timestamp to date
df['DATETIMESTAMP'] = df['DATETIMESTAMP'].values.astype('<M8[D]')
DATE DATETIMESTAMP
0 2014-08-06 2014-08-06
1 2013-01-15 2013-01-15
2 2013-03-07 2013-03-07
3 2012-12-04 2012-12-04
4 2014-05-06 2014-05-06
5 2013-02-13 2013-02-13
# compare
df['DATE'] == df['DATETIMESTAMP']
0 True
1 True
2 True
3 True
4 True
5 True
dtype: bool
回答2:
How about:
import time
filename = dates.csv
with open(filename) as f:
contents = f.readlines()
for i in contents:
date1, date2 = i.split(',')
date1 = date1.strip('"')
date2 = date2.split('T')[0].strip('"')
date1a = time.strftime("%Y-%m-%d",time.strptime(date1, "%m/%d/%Y"))
print i if date1a == date2 else None
来源:https://stackoverflow.com/questions/31577280/datetime-comparisons-in-python