Using dateutil.parser to parse a date in another language

前端未结

关注

 4  1469

Dateutil is a great tool for parsing dates in string format. for example

from dateutil.parser import parse
parse(\"Tue, 01 Oct 2013 14:26:00 -0300\")

相关标签:

4条回答

长发绾君心

2020-12-19 04:34

from dateutil.parser import parse
parse("Ter, 01 Out 2013 14:26:00 -0300",fuzzy=True)

Result:

datetime.datetime(2013, 1, 28, 14, 26, tzinfo=tzoffset(None, -10800))

0 讨论(0)

挽巷

2020-12-19 04:35
I think the best solution is to subclass the parser from dateutil and use the calendar lib constants. This is a simple solution, I didn't test it a lot, so use with caution.

It is very simple and will localize dateutil for a lot of languages. Create a module localeparseinfo.py:
```
import calendar
from dateutil import parser
    
class LocaleParserInfo(parser.parserinfo):
    WEEKDAYS = zip(calendar.day_abbr, calendar.day_name)
    MONTHS = list(zip(calendar.month_abbr, calendar.month_name))[1:]
```
Now you can use your new parseinfo object as a parameter to dateutil.parser.
```
In [1]: import locale;locale.setlocale(locale.LC_ALL, "pt_BR.utf8")
In [2]: from localeparserinfo import LocaleParserInfo                                   

In [3]: from dateutil.parser import parse                                                

In [4]: parse("Ter, 01 Out 2013 14:26:00 -0300", parserinfo=PtParserInfo())              
Out[4]: datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))
```
Look that this solves a lot of different language parse, but it is an incomplete solution for all possible dates and times. Take a look at dateutil parser.py, specially the parserinfo class variables. Take a look at HMS variable and others.

You can even pass the locale string as an argument to your parserinfo class.
0 讨论(0)
发布评论:

提交评论
- 加载中...

名媛妹妹

2020-12-19 04:40

You could use PyICU to parse a localized date/time string in a given format:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from datetime import datetime
import icu  # PyICU

df = icu.SimpleDateFormat(
               'EEE, dd MMM yyyy HH:mm:ss zzz', icu.Locale('pt_BR'))
ts = df.parse(u'Ter, 01 Out 2013 14:26:00 -0300')
print(datetime.utcfromtimestamp(ts))
# -> 2013-10-01 17:26:00 (UTC)

It works on Python 2/3. It does not modify global state (locale).

If your actual input time string does not contain the explicit utc offset then you should specify a timezone to be used by ICU explicitly otherwise you can get a wrong result (ICU and datetime may use different timezone definitions).

If you only need to support Python 3 and you don't mind setting the locale then you could use datetime.strptime() as @alexwlchan suggested:

#!/usr/bin/env python3
import locale
from datetime import datetime

locale.setlocale(locale.LC_TIME, "pt_PT.UTF-8")
print(datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
                        "%a, %d %b %Y %H:%M:%S %z")) # works on Python 3.2+
# -> 2013-10-01 14:26:00-03:00

0 讨论(0)

梦如初夏

2020-12-19 04:57
As far as I can see, dateutil is not locale aware (yet!).

I can think of three alternative suggestions:
- The day and month names are hardcoded in dateutil.parser (as part of the parserinfo class). You could subclass parserinfo, and replace these names with the appropriate names for Portuguese.
- Modify dateutil to get day and month names based on the user’s locale. So you could do something like
```
import locale
locale.setlocale(locale.LC_ALL, "pt_PT")

from dateutil.parser import parse
parse("Ter, 01 Out 2013 14:26:00 -0300")
```
  I’ve started a fork which gets the names from the calendar module (which is locale-aware) to work on this: https://github.com/alexwlchan/dateutil
  
  Right now it works for Portuguese (or seems to), but I want to think about it a bit more before I submit a patch to the main branch. In particular, weirdness may happen if it faces characters which aren’t used in Western European languages. I haven’t tested this yet. (See https://stackoverflow.com/a/8917539/1558022)
- If you’re not tied to the dateutil module, you could use datetime instead, which is already locale-aware:
```
from datetime import datetime, date
import locale

locale.setlocale(locale.LC_ALL, "pt_PT")
datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
                  "%a, %d %b %Y %H:%M:%S %z")
```
  (Note that the %z token is not consistently supported in datetime.)
0 讨论(0)
发布评论:

提交评论
- 加载中...