Using dateutil.parser to parse a date in another language

前端 未结 4 1469
一整个雨季
一整个雨季 2020-12-19 03:55

Dateutil is a great tool for parsing dates in string format. for example

from dateutil.parser import parse
parse(\"Tue, 01 Oct 2013 14:26:00 -0300\")


        
相关标签:
4条回答
  • 2020-12-19 04:34
    from dateutil.parser import parse
    parse("Ter, 01 Out 2013 14:26:00 -0300",fuzzy=True)
    

    Result:

    datetime.datetime(2013, 1, 28, 14, 26, tzinfo=tzoffset(None, -10800))
    
    0 讨论(0)
  • 2020-12-19 04:35

    I think the best solution is to subclass the parser from dateutil and use the calendar lib constants. This is a simple solution, I didn't test it a lot, so use with caution.

    It is very simple and will localize dateutil for a lot of languages. Create a module localeparseinfo.py:

    import calendar
    from dateutil import parser
        
    class LocaleParserInfo(parser.parserinfo):
        WEEKDAYS = zip(calendar.day_abbr, calendar.day_name)
        MONTHS = list(zip(calendar.month_abbr, calendar.month_name))[1:]
    

    Now you can use your new parseinfo object as a parameter to dateutil.parser.

    In [1]: import locale;locale.setlocale(locale.LC_ALL, "pt_BR.utf8")
    In [2]: from localeparserinfo import LocaleParserInfo                                   
    
    In [3]: from dateutil.parser import parse                                                
    
    In [4]: parse("Ter, 01 Out 2013 14:26:00 -0300", parserinfo=PtParserInfo())              
    Out[4]: datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))
    

    Look that this solves a lot of different language parse, but it is an incomplete solution for all possible dates and times. Take a look at dateutil parser.py, specially the parserinfo class variables. Take a look at HMS variable and others.

    You can even pass the locale string as an argument to your parserinfo class.

    0 讨论(0)
  • 2020-12-19 04:40

    You could use PyICU to parse a localized date/time string in a given format:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    from datetime import datetime
    import icu  # PyICU
    
    df = icu.SimpleDateFormat(
                   'EEE, dd MMM yyyy HH:mm:ss zzz', icu.Locale('pt_BR'))
    ts = df.parse(u'Ter, 01 Out 2013 14:26:00 -0300')
    print(datetime.utcfromtimestamp(ts))
    # -> 2013-10-01 17:26:00 (UTC)
    

    It works on Python 2/3. It does not modify global state (locale).

    If your actual input time string does not contain the explicit utc offset then you should specify a timezone to be used by ICU explicitly otherwise you can get a wrong result (ICU and datetime may use different timezone definitions).

    If you only need to support Python 3 and you don't mind setting the locale then you could use datetime.strptime() as @alexwlchan suggested:

    #!/usr/bin/env python3
    import locale
    from datetime import datetime
    
    locale.setlocale(locale.LC_TIME, "pt_PT.UTF-8")
    print(datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
                            "%a, %d %b %Y %H:%M:%S %z")) # works on Python 3.2+
    # -> 2013-10-01 14:26:00-03:00
    
    0 讨论(0)
  • 2020-12-19 04:57

    As far as I can see, dateutil is not locale aware (yet!).

    I can think of three alternative suggestions:

    • The day and month names are hardcoded in dateutil.parser (as part of the parserinfo class). You could subclass parserinfo, and replace these names with the appropriate names for Portuguese.

    • Modify dateutil to get day and month names based on the user’s locale. So you could do something like

      import locale
      locale.setlocale(locale.LC_ALL, "pt_PT")
      
      from dateutil.parser import parse
      parse("Ter, 01 Out 2013 14:26:00 -0300")
      

      I’ve started a fork which gets the names from the calendar module (which is locale-aware) to work on this: https://github.com/alexwlchan/dateutil

      Right now it works for Portuguese (or seems to), but I want to think about it a bit more before I submit a patch to the main branch. In particular, weirdness may happen if it faces characters which aren’t used in Western European languages. I haven’t tested this yet. (See https://stackoverflow.com/a/8917539/1558022)

    • If you’re not tied to the dateutil module, you could use datetime instead, which is already locale-aware:

      from datetime import datetime, date
      import locale
      
      locale.setlocale(locale.LC_ALL, "pt_PT")
      datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
                        "%a, %d %b %Y %H:%M:%S %z")
      

      (Note that the %z token is not consistently supported in datetime.)

    0 讨论(0)
提交回复
热议问题