Strange behavior with pandas timestamp to posix conversion

谁说我不能喝 提交于 2021-02-10 18:41:49

问题


I do the following operations:

  1. Convert string datetime in pandas dataframe to python datetime via apply(strptime)
  2. Convert datetime to posix timestamp via .timestamp() method
  3. If I revert posix back to datetime with .fromtimestamp() I obtain different datetime

It differs by 3 hours which is my timezone (I'm at UTC+3 now), so I suppose it is a kind of timezone issue. Also I understand that in apply it implicitly converts to pandas.Timestamp, but I don't understand the difference in this case.

What is the reason for such strange behavior and what should I do to avoid it? Actually in my project I need to compare this pandas timestamps with correct poxis timestamps and now it works wrong.

Below is dummy reproducible example:

df = pd.DataFrame(['2018-03-03 14:30:00'], columns=['c'])
df['c'] = df['c'].apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
dt = df['c'].iloc[0]
dt
>> Timestamp('2018-03-03 14:30:00')
datetime.datetime.fromtimestamp(dt.timestamp())
>> datetime.datetime(2018, 3, 3, 17, 30)

回答1:


First, I suggest using the np.timedelta64 dtype when working with pandas. In this case it makes the reciprocity simple.

pd.to_datetime('2018-03-03 14:30:00').value
#1520087400000000000

pd.to_datetime(pd.to_datetime('2018-03-03 14:30:00').value)
#Timestamp('2018-03-03 14:30:00')

The issue with the other methods is that POSIX has UTC as the origin, but fromtimestamp returns the local time. If your system isn't UTC compliant, then we get issues. The following methods will work to remedy this:

from datetime import datetime
import pytz

dt
#Timestamp('2018-03-03 14:30:00')

# Seemingly problematic:
datetime.fromtimestamp(dt.timestamp())
#datetime.datetime(2018, 3, 3, 9, 30)

datetime.fromtimestamp(dt.timestamp(), tz=pytz.utc)
#datetime.datetime(2018, 3, 3, 14, 30, tzinfo=<UTC>)

datetime.combine(dt.date(), dt.timetz())
#datetime.datetime(2018, 3, 3, 14, 30)

mytz = pytz.timezone('US/Eastern')  # Use your own local timezone
datetime.fromtimestamp(mytz.localize(dt).timestamp())
#datetime.datetime(2018, 3, 3, 14, 30)



回答2:


An answer with the to_datetime function:

df = pd.DataFrame(['2018-03-03 14:30:00'], columns=['c'])
df['c'] = pd.to_datetime(df['c'].values, dayfirst=False).tz_localize('Your/Timezone')

When working with date, you should always put a timezone it is easier after to work with.

It does not explain the difference between the datetime in pandas and alone.



来源:https://stackoverflow.com/questions/57465747/strange-behavior-with-pandas-timestamp-to-posix-conversion

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!