问题
I am reading a csv using pandas
str,date,float,time,datetime
a,10/11/19,1.1,10:30:00,10/11/19 10:30
b,10/11/19,1.2,10:00:00,10/11/19 10:30
c,10/11/19,1.3,11:10:11,10/11/19 10:30
df = pd.read_csv(file)
Now my business requirement is that I want to tell which column is pure date field, pure time field, or which is complete datetime. For particular column my code is:
try:
dt = pd.to_datetime(df[col])
dates = [obj.date() for obj in dt]
times = [obj.time() for obj in dt]
if dates and (set(times) == set([datetime.time(0, 0)])):
# Its a pure date field
elif <something>:
# Its a pure time field
else:
#Its a Datetime field
except:
# its not a datefield
problem with my code is when there is only time field, pd.to_datetime is taking default today's date so I am not able to differentiate it with datetime. Is there any easy solution? Please help me fill "something" in code above
回答1:
If want test times, pandas by default use today dates, so possible solution is test them with Series.dt.date, Timestamp.date and Series.all if all values of column match.
Also added another solution for test dates - test if same values after removed times by Series.dt.floor:
df = pd.DataFrame({'a':['2019-01-01 12:23:10',
'2019-01-02 12:23:10'],
'b':['2019-01-01',
'2019-01-02'],
'c':['12:23:10',
'15:23:10'],
'd':['a','b']})
print (df)
a b c d
0 2019-01-01 12:23:10 2019-01-01 12:23:10 a
1 2019-01-02 12:23:10 2019-01-02 15:23:10 b
def check(col):
try:
dt = pd.to_datetime(df[col])
if (dt.dt.floor('d') == dt).all():
return ('Its a pure date field')
elif (dt.dt.date == pd.Timestamp('now').date()).all():
return ('Its a pure time field')
else:
return ('Its a Datetime field')
except:
return ('its not a datefield')
print (check('a'))
print (check('b'))
print (check('c'))
print (check('d'))
Its a Datetime field
Its a pure date field
Its a pure time field
its not a datefield
Another idea is also test if numeric columns and by default return not numeric for prevent casting numeric to datetimes, but if possible all datetimes contains only todays dates (f column) then test for times is different with Series.str.contains for match pattern HH:MM:SS or H:MM:SS:
df = pd.DataFrame({'a':['2019-01-01 12:23:10',
'2019-01-02'],
'b':['2019-01-01',
'2019-01-02'],
'c':['12:23:10',
'15:23:10'],
'd':['a','b'],
'e':[1,2],
'f':['2019-11-13 12:23:10',
'2019-11-13'],})
print (df)
a b c d e f
0 2019-01-01 12:23:10 2019-01-01 12:23:10 a 1 2019-11-13 12:23:10
1 2019-01-02 2019-01-02 15:23:10 b 2 2019-11-13
def check(col):
if np.issubdtype(df[col].dtype, np.number):
return ('its not a datefield')
try:
dt = pd.to_datetime(df[col])
if (dt.dt.floor('d') == dt).all():
return ('Its a pure date field')
elif df[col].str.contains(r"^\d{1,2}:\d{2}:\d{2}$").all():
return ('Its a pure time field')
else:
return ('Its a Datetime field')
except:
return ('its not a datefield')
print (check('a'))
print (check('b'))
print (check('c'))
print (check('d'))
print (check('e'))
print (check('f'))
Its a Datetime field
Its a pure date field
Its a pure time field
its not a datefield
its not a datefield
Its a Datetime field
来源:https://stackoverflow.com/questions/58831943/python-pandas-check-that-string-is-only-date-or-only-time-or-datetime