Pandas convert datetime with a separate time zone column

前端 未结 2 936
渐次进展
渐次进展 2020-12-07 02:32

I have a dataframe with a column for the time zone and a column for the datetime. I would like to convert these to UTC first to join with other data, and then I\'ll have som

相关标签:
2条回答
  • 2020-12-07 03:13

    Here is a vectorized approach (it will loop df.time_zone.nunique() times):

    In [2]: t
    Out[2]:
                 datetime         time_zone
    0 2016-09-19 01:29:13    America/Bogota
    1 2016-09-19 02:16:04  America/New_York
    2 2016-09-19 01:57:54      Africa/Cairo
    3 2016-09-19 11:00:00    America/Bogota
    4 2016-09-19 12:00:00  America/New_York
    5 2016-09-19 13:00:00      Africa/Cairo
    
    In [3]: for tz in t.time_zone.unique():
       ...:         mask = (t.time_zone == tz)
       ...:         t.loc[mask, 'datetime'] = \
       ...:             t.loc[mask, 'datetime'].dt.tz_localize(tz).dt.tz_convert('UTC')
       ...:
    
    In [4]: t
    Out[4]:
                 datetime         time_zone
    0 2016-09-19 06:29:13    America/Bogota
    1 2016-09-19 06:16:04  America/New_York
    2 2016-09-18 23:57:54      Africa/Cairo
    3 2016-09-19 16:00:00    America/Bogota
    4 2016-09-19 16:00:00  America/New_York
    5 2016-09-19 11:00:00      Africa/Cairo
    

    UPDATE:

    In [12]: df['new'] = df.groupby('time_zone')['datetime'] \
                           .transform(lambda x: x.dt.tz_localize(x.name))
    
    In [13]: df
    Out[13]:
                 datetime         time_zone                 new
    0 2016-09-19 01:29:13    America/Bogota 2016-09-19 06:29:13
    1 2016-09-19 02:16:04  America/New_York 2016-09-19 06:16:04
    2 2016-09-19 01:57:54      Africa/Cairo 2016-09-18 23:57:54
    3 2016-09-19 11:00:00    America/Bogota 2016-09-19 16:00:00
    4 2016-09-19 12:00:00  America/New_York 2016-09-19 16:00:00
    5 2016-09-19 13:00:00      Africa/Cairo 2016-09-19 11:00:00
    
    0 讨论(0)
  • 2020-12-07 03:35

    Your issue is that tz_localize() can only take a scalar value, so we'll have to iterate through the DataFrame:

    df['datetime_utc'] = [d['datetime'].tz_localize(d['time_zone']).tz_convert('UTC') for i,d in df.iterrows()]
    

    The result is:

                datetime         time_zone              datetime_utc
    0 2016-09-19 01:29:13    America/Bogota 2016-09-19 06:29:13+00:00
    1 2016-09-19 02:16:04  America/New_York 2016-09-19 06:16:04+00:00
    2 2016-09-19 01:57:54      Africa/Cairo 2016-09-18 23:57:54+00:00
    

    An alternative approach is to group by the timezone and convert all matching rows in one pass:

    df['datetime_utc'] = pd.concat([d['datetime'].dt.tz_localize(tz).dt.tz_convert('UTC') for tz, d in df.groupby('time_zone')])
    
    0 讨论(0)
提交回复
热议问题