Parse multiple date formats into a single format

我的未来我决定 提交于 2021-02-08 19:50:35

问题


I have one column called published (date). As you can see, it has multiple date formats and also nan values. I would like to skip nan values, convert all the other formats to %Y-%-%d, and ignore the one that has the only year.

I tried df['publish_time']=pd.to_datetime(df['publish_time']) and also things like:

fmt=['%Y-%m-%d', '%d-%m-%Y', '%d/%m/%Y',
              '%Y-%d-%m', '%Y-%d-%b', '%d-%b-%Y', '%d/%b/%Y','Year: %d; month','month: 
               %d;Year','%Y','%b %d %Y','%b %Y %d']

but I could not solve it. Any suggestions? Thanks!

Here is that column:

published

2014 Jul 22
2003 Aug
2019 Nov 26
2012-12-07
2020 Jan 21
2015-01-01
2010-11-30
2007-05-10
2020
2012-02-29
2016 Apr 19
2006-12-31
2013 Jun 27
2019 Jun 19
2015 Jun 12
2006 Jun-Dec
2006-07-31
nan
2017-04-15
2016 May 22
2020 Feb
2017 May 6
2020 Mar 11
2013-04-30
2020-03-07
nan
2018

回答1:


First was added 2 new formats to fmt list:

fmt=['%Y-%m-%d', '%d-%m-%Y', '%d/%m/%Y',
     '%Y-%d-%m', '%Y-%d-%b', '%d-%b-%Y', '%d/%b/%Y','Year: %d; month',
     'month: %d;Year','%Y','%b %d %Y','%b %Y %d',
     '%Y %b %d', '%Y %b']

Then in list comprehension convert column to datetimes, parameter errors='coerce' is for non matched values to missing values. Last join together by concat.

Last because possible multiple values per rows because dd/mm/YYYY vs mm/dd/YYYY formats (not sure if month of day) is used back filling with select first column. It means what format is first in list it is selected with high priority.

dfs = [pd.to_datetime(df['publish_time'], format=f, errors='coerce') for f in fmt]
df['publish_time1']= pd.concat(dfs, axis=1).bfill(axis=1).iloc[:, 0]

print (df)
    publish_time publish_time1
0    2014 Jul 22    2014-07-22
1       2003 Aug    2003-08-01
2    2019 Nov 26    2019-11-26
3     2012-12-07    2012-12-07
4    2020 Jan 21    2020-01-21
5     2015-01-01    2015-01-01
6     2010-11-30    2010-11-30
7     2007-05-10    2007-05-10
8           2020    2020-01-01
9     2012-02-29    2012-02-29
10   2016 Apr 19    2016-04-19
11    2006-12-31    2006-12-31
12   2013 Jun 27    2013-06-27
13   2019 Jun 19    2019-06-19
14   2015 Jun 12    2015-06-12
15  2006 Jun-Dec           NaT
16    2006-07-31    2006-07-31
17           NaN           NaT
18    2017-04-15    2017-04-15
19   2016 May 22    2016-05-22
20      2020 Feb    2020-02-01
21    2017 May 6    2017-05-06
22   2020 Mar 11    2020-03-11
23    2013-04-30    2013-04-30
24    2020-03-07    2020-03-07
25           NaN           NaT
26          2018    2018-01-01


来源:https://stackoverflow.com/questions/60831665/parse-multiple-date-formats-into-a-single-format

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!