Using dateutil.parser to parse a date in another language

会有一股神秘感。 提交于 2019-12-01 02:20:52

问题


Dateutil is a great tool for parsing dates in string format. for example

from dateutil.parser import parse
parse("Tue, 01 Oct 2013 14:26:00 -0300")

returns

datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))

however,

parse("Ter, 01 Out 2013 14:26:00 -0300") # In portuguese

yields this error:

ValueError: unknown string format

Does anybody know how to make dateutil aware of the locale?


回答1:


As far as I can see, dateutil is not locale aware (yet!).

I can think of three alternative suggestions:

  • The day and month names are hardcoded in dateutil.parser (as part of the parserinfo class). You could subclass parserinfo, and replace these names with the appropriate names for Portuguese.

  • Modify dateutil to get day and month names based on the user’s locale. So you could do something like

    import locale
    locale.setlocale(locale.LC_ALL, "pt_PT")
    
    from dateutil.parser import parse
    parse("Ter, 01 Out 2013 14:26:00 -0300")
    

    I’ve started a fork which gets the names from the calendar module (which is locale-aware) to work on this: https://github.com/alexwlchan/dateutil

    Right now it works for Portuguese (or seems to), but I want to think about it a bit more before I submit a patch to the main branch. In particular, weirdness may happen if it faces characters which aren’t used in Western European languages. I haven’t tested this yet. (See https://stackoverflow.com/a/8917539/1558022)

  • If you’re not tied to the dateutil module, you could use datetime instead, which is already locale-aware:

    from datetime import datetime, date
    import locale
    
    locale.setlocale(locale.LC_ALL, "pt_PT")
    datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
                      "%a, %d %b %Y %H:%M:%S %z")
    

    (Note that the %z token is not consistently supported in datetime.)




回答2:


You could use PyICU to parse a localized date/time string in a given format:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from datetime import datetime
import icu  # PyICU

df = icu.SimpleDateFormat(
               'EEE, dd MMM yyyy HH:mm:ss zzz', icu.Locale('pt_BR'))
ts = df.parse(u'Ter, 01 Out 2013 14:26:00 -0300')
print(datetime.utcfromtimestamp(ts))
# -> 2013-10-01 17:26:00 (UTC)

It works on Python 2/3. It does not modify global state (locale).

If your actual input time string does not contain the explicit utc offset then you should specify a timezone to be used by ICU explicitly otherwise you can get a wrong result (ICU and datetime may use different timezone definitions).

If you only need to support Python 3 and you don't mind setting the locale then you could use datetime.strptime() as @alexwlchan suggested:

#!/usr/bin/env python3
import locale
from datetime import datetime

locale.setlocale(locale.LC_TIME, "pt_PT.UTF-8")
print(datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
                        "%a, %d %b %Y %H:%M:%S %z")) # works on Python 3.2+
# -> 2013-10-01 14:26:00-03:00



回答3:


from dateutil.parser import parse
parse("Ter, 01 Out 2013 14:26:00 -0300",fuzzy=True)

Result:

datetime.datetime(2013, 1, 28, 14, 26, tzinfo=tzoffset(None, -10800))


来源:https://stackoverflow.com/questions/19927654/using-dateutil-parser-to-parse-a-date-in-another-language

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!