Dateutil is a great tool for parsing dates in string format. for example
from dateutil.parser import parse
parse(\"Tue, 01 Oct 2013 14:26:00 -0300\")
from dateutil.parser import parse
parse("Ter, 01 Out 2013 14:26:00 -0300",fuzzy=True)
Result:
datetime.datetime(2013, 1, 28, 14, 26, tzinfo=tzoffset(None, -10800))
I think the best solution is to subclass the parser from dateutil and use the calendar
lib constants. This is a simple solution, I didn't test it a lot, so use with caution.
It is very simple and will localize dateutil for a lot of languages. Create a module localeparseinfo.py
:
import calendar
from dateutil import parser
class LocaleParserInfo(parser.parserinfo):
WEEKDAYS = zip(calendar.day_abbr, calendar.day_name)
MONTHS = list(zip(calendar.month_abbr, calendar.month_name))[1:]
Now you can use your new parseinfo object as a parameter to dateutil.parser
.
In [1]: import locale;locale.setlocale(locale.LC_ALL, "pt_BR.utf8")
In [2]: from localeparserinfo import LocaleParserInfo
In [3]: from dateutil.parser import parse
In [4]: parse("Ter, 01 Out 2013 14:26:00 -0300", parserinfo=PtParserInfo())
Out[4]: datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))
Look that this solves a lot of different language parse, but it is an incomplete solution for all possible dates and times. Take a look at dateutil parser.py
, specially the parserinfo
class variables. Take a look at HMS variable and others.
You can even pass the locale string as an argument to your parserinfo class.
You could use PyICU to parse a localized date/time string in a given format:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from datetime import datetime
import icu # PyICU
df = icu.SimpleDateFormat(
'EEE, dd MMM yyyy HH:mm:ss zzz', icu.Locale('pt_BR'))
ts = df.parse(u'Ter, 01 Out 2013 14:26:00 -0300')
print(datetime.utcfromtimestamp(ts))
# -> 2013-10-01 17:26:00 (UTC)
It works on Python 2/3. It does not modify global state (locale).
If your actual input time string does not contain the explicit utc offset then you should specify a timezone to be used by ICU explicitly otherwise you can get a wrong result (ICU and datetime may use different timezone definitions).
If you only need to support Python 3 and you don't mind setting the locale then you could use datetime.strptime()
as @alexwlchan suggested:
#!/usr/bin/env python3
import locale
from datetime import datetime
locale.setlocale(locale.LC_TIME, "pt_PT.UTF-8")
print(datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
"%a, %d %b %Y %H:%M:%S %z")) # works on Python 3.2+
# -> 2013-10-01 14:26:00-03:00
As far as I can see, dateutil is not locale aware (yet!).
I can think of three alternative suggestions:
The day and month names are hardcoded in dateutil.parser
(as part of the parserinfo
class). You could subclass parserinfo, and replace these names with the appropriate names for Portuguese.
Modify dateutil to get day and month names based on the user’s locale. So you could do something like
import locale
locale.setlocale(locale.LC_ALL, "pt_PT")
from dateutil.parser import parse
parse("Ter, 01 Out 2013 14:26:00 -0300")
I’ve started a fork which gets the names from the calendar
module (which is locale-aware) to work on this: https://github.com/alexwlchan/dateutil
Right now it works for Portuguese (or seems to), but I want to think about it a bit more before I submit a patch to the main branch. In particular, weirdness may happen if it faces characters which aren’t used in Western European languages. I haven’t tested this yet. (See https://stackoverflow.com/a/8917539/1558022)
If you’re not tied to the dateutil module, you could use datetime instead, which is already locale-aware:
from datetime import datetime, date
import locale
locale.setlocale(locale.LC_ALL, "pt_PT")
datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
"%a, %d %b %Y %H:%M:%S %z")
(Note that the %z
token is not consistently supported in datetime.)