Can I parse dates in different formats?

后端 未结 1 489
不知归路
不知归路 2020-12-11 13:28

A collaborator of mine has inconsistent date formatting in their data.

0   13/11/2016
1   21/01/2017
2   22/01/2017
3   2017-02-02
4   2016-12-11
5   13/11/2         


        
相关标签:
1条回答
  • 2020-12-11 13:48

    You can use to_datetime:

    First format (YYYY-MM-DD):

    print (df)
            dates
    0  13/11/2016
    1  21/01/2017
    2  22/01/2017
    3  2017-02-02
    4  2016-12-11
    5  13/11/2016
    6  2016-12-12
    7  21/01/2017
    8  22/01/2017
    9  2017-02-02
    9  2017-02-25 <- YYYY-MM-DD
    
    dates = pd.to_datetime(df.dates)
    print (dates)
    0   2016-11-13
    1   2017-01-21
    2   2017-01-22
    3   2017-02-02
    4   2016-12-11
    5   2016-11-13
    6   2016-12-12
    7   2017-01-21
    8   2017-01-22
    9   2017-02-02
    9   2017-02-25
    Name: dates, dtype: datetime64[ns]
    

    Second format (YYYY-DD-MM)

    It is a bit problematic - need parameter format and errors='coerce' in to_datetime, last combine_first or fillna:

    print (df)
            dates
    0  13/11/2016
    1  21/01/2017
    2  22/01/2017
    3  2017-02-02
    4  2016-12-11
    5  13/11/2016
    6  2016-12-12
    7  21/01/2017
    8  22/01/2017
    9  2017-02-02
    9  2017-25-02 <- YYYY-DD-MM
    
    dates1 = pd.to_datetime(df.dates, format='%d/%m/%Y', errors='coerce')
    dates2 = pd.to_datetime(df.dates, format='%Y-%d-%m', errors='coerce')
    
    dates = dates1.combine_first(dates2)
    #dates = dates1.fillna(dates2)
    print (dates)
    0   2016-11-13
    1   2017-01-21
    2   2017-01-22
    3   2017-02-02
    4   2016-11-12
    5   2016-11-13
    6   2016-12-12
    7   2017-01-21
    8   2017-01-22
    9   2017-02-02
    9   2017-02-25
    Name: dates, dtype: datetime64[ns]
    
    0 讨论(0)
提交回复
热议问题