Pandas: How to create a datetime object from Week and Year?

后端 未结 4 1473
慢半拍i
慢半拍i 2020-12-01 14:22

I have a dataframe that provides two integer columns with the Year and Week of the year:

import pandas as pd
import numpy as np
L1 = [43,44,51,2,5,12]
L2 = [         


        
相关标签:
4条回答
  • 2020-12-01 14:55

    There is something fishy going on with weeks starting from 2019. The ISO-8601 standard assigns the 31st December 2018 to the week 1 of year 2019. The other approaches based on:

    pd.to_datetime(df.Week.astype(str)+
                      df.Year.astype(str).add('-2') ,format='%W%Y-%w')
    

    will give shifted results starting from 2019.

    In order to be compliant with the ISO-8601 standard you would have to do the following:

    import pandas as pd
    import datetime
    
    L1 = [52,53,1,2,5,52]
    L2 = [2018,2018,2019,2019,2019,2019]
    df = pd.DataFrame({"Week":L1,"Year":L2})
    df['ISO'] = df['Year'].astype(str) + '-W' + df['Week'].astype(str) + '-1'
    df['DT'] = df['ISO'].map(lambda x: datetime.datetime.strptime(x, "%G-W%V-%u"))
    print(df)
    

    It prints:

       Week  Year         ISO         DT
    0    52  2018  2018-W52-1 2018-12-24
    1    53  2018  2018-W53-1 2018-12-31
    2     1  2019   2019-W1-1 2018-12-31
    3     2  2019   2019-W2-1 2019-01-07
    4     5  2019   2019-W5-1 2019-01-28
    5    52  2019  2019-W52-1 2019-12-23
    
    

    The week 53 of 2018 is ignored and mapped to the week 1 of 2019.

    Please verify yourself on https://www.epochconverter.com/weeks/2019.

    0 讨论(0)
  • 2020-12-01 15:08

    Like @Gianmario Spacagna mentioned for datetimes higher like 2018 use %V with %G:

    L1 = [43,44,51,2,5,12,52,53,1,2,5,52]
    L2 = [2016,2016,2016,2017,2017,2017,2018,2018,2019,2019,2019,2019]
    df = pd.DataFrame({"Week":L1,"Year":L2})
    
    
    df['new'] = pd.to_datetime(df.Week.astype(str)+
                               df.Year.astype(str).add('-1') ,format='%V%G-%u')
    print (df)
        Week  Year        new
    0     43  2016 2016-10-24
    1     44  2016 2016-10-31
    2     51  2016 2016-12-19
    3      2  2017 2017-01-09
    4      5  2017 2017-01-30
    5     12  2017 2017-03-20
    6     52  2018 2018-12-24
    7     53  2018 2018-12-31
    8      1  2019 2018-12-31
    9      2  2019 2019-01-07
    10     5  2019 2019-01-28
    11    52  2019 2019-12-23
    
    0 讨论(0)
  • 2020-12-01 15:12

    If you want to follow ISO Week Date

    Weeks start with Monday. Each week's year is the Gregorian year in which the Thursday falls. The first week of the year, hence, always contains 4 January. ISO week year numbering therefore slightly deviates from the Gregorian for some days close to 1 January.

    The following sample code, generates a sequence of 60 Dates, starting from 18Dec2016 Sun and adds the appropriate columns.

    It adds:

    • A "Date"
    • Week Day of the "Date"
    • Finds the Week Starting Monday of that "Date"
    • Finds the Year of the Week Starting Monday of that "Date"
    • Adds a Week Number (ISO)
    • Gets the Starting Monday Date, from Year and Week Number

    Sample Code Below:

    # Generate Some Dates
    dft1 = pd.DataFrame(pd.date_range('2016-12-18', freq='D', periods=60))
    dft1.columns = ['e_FullDate']
    dft1['e_FullDateWeekDay'] = dft1.e_FullDate.dt.day_name().str.slice(0,3)
    
    
    #Add a Week Start Date (Monday)
    dft1['e_week_start'] = dft1['e_FullDate'] - pd.to_timedelta(dft1['e_FullDate'].dt.weekday,
                                                          unit='D')
    dft1['e_week_startWeekDay'] = dft1.e_week_start.dt.day_name().str.slice(0,3)
    
    #Add a Week Start Year
    dft1['e_week_start_yr'] = dft1.e_week_start.dt.year
    
    #Add a Week Number of Week Start Monday
    dft1['e_week_no'] = dft1['e_week_start'].dt.week
    
    #Add a Week Start generate from Week Number and Year
    dft1['e_week_start_from_week_no'] = pd.to_datetime(dft1.e_week_no.astype(str)+
                      dft1.e_week_start_yr.astype(str).add('-1') ,format='%W%Y-%w')
    dft1['e_week_start_from_week_noWeekDay'] = dft1.e_week_start_from_week_no.dt.day_name().str.slice(0,3)
    
    
    with pd.option_context('display.max_rows', 999, 'display.max_columns', 0, 'display.max_colwidth', 9999):
        display(dft1)
    

    0 讨论(0)
  • 2020-12-01 15:18

    Try this:

    In [19]: pd.to_datetime(df.Year.astype(str), format='%Y') + \
                 pd.to_timedelta(df.Week.mul(7).astype(str) + ' days')
    Out[19]:
    0   2016-10-28
    1   2016-11-04
    2   2016-12-23
    3   2017-01-15
    4   2017-02-05
    5   2017-03-26
    dtype: datetime64[ns]
    

    Initially I have timestamps in s

    It's much easier to parse it from UNIX epoch timestamp:

    df['Date'] = pd.to_datetime(df['UNIX_Time'], unit='s')
    

    Timing for 10M rows DF:

    Setup:

    In [26]: df = pd.DataFrame(pd.date_range('1970-01-01', freq='1T', periods=10**7), columns=['date'])
    
    In [27]: df.shape
    Out[27]: (10000000, 1)
    
    In [28]: df['unix_ts'] = df['date'].astype(np.int64)//10**9
    
    In [30]: df
    Out[30]:
                           date    unix_ts
    0       1970-01-01 00:00:00          0
    1       1970-01-01 00:01:00         60
    2       1970-01-01 00:02:00        120
    3       1970-01-01 00:03:00        180
    4       1970-01-01 00:04:00        240
    5       1970-01-01 00:05:00        300
    6       1970-01-01 00:06:00        360
    7       1970-01-01 00:07:00        420
    8       1970-01-01 00:08:00        480
    9       1970-01-01 00:09:00        540
    ...                     ...        ...
    9999990 1989-01-05 10:30:00  599999400
    9999991 1989-01-05 10:31:00  599999460
    9999992 1989-01-05 10:32:00  599999520
    9999993 1989-01-05 10:33:00  599999580
    9999994 1989-01-05 10:34:00  599999640
    9999995 1989-01-05 10:35:00  599999700
    9999996 1989-01-05 10:36:00  599999760
    9999997 1989-01-05 10:37:00  599999820
    9999998 1989-01-05 10:38:00  599999880
    9999999 1989-01-05 10:39:00  599999940
    
    [10000000 rows x 2 columns]
    

    Check:

    In [31]: pd.to_datetime(df.unix_ts, unit='s')
    Out[31]:
    0         1970-01-01 00:00:00
    1         1970-01-01 00:01:00
    2         1970-01-01 00:02:00
    3         1970-01-01 00:03:00
    4         1970-01-01 00:04:00
    5         1970-01-01 00:05:00
    6         1970-01-01 00:06:00
    7         1970-01-01 00:07:00
    8         1970-01-01 00:08:00
    9         1970-01-01 00:09:00
                      ...
    9999990   1989-01-05 10:30:00
    9999991   1989-01-05 10:31:00
    9999992   1989-01-05 10:32:00
    9999993   1989-01-05 10:33:00
    9999994   1989-01-05 10:34:00
    9999995   1989-01-05 10:35:00
    9999996   1989-01-05 10:36:00
    9999997   1989-01-05 10:37:00
    9999998   1989-01-05 10:38:00
    9999999   1989-01-05 10:39:00
    Name: unix_ts, Length: 10000000, dtype: datetime64[ns]
    

    Timing:

    In [32]: %timeit pd.to_datetime(df.unix_ts, unit='s')
    10 loops, best of 3: 156 ms per loop
    

    Conclusion: I think 156 milliseconds for converting 10.000.000 rows is not that slow

    0 讨论(0)
提交回复
热议问题