Is there an easy way to parse HTTP date-strings in Python? According to the standard, there are several ways to format HTTP date strings; the method should be able to handle
>>> import email.utils as eut
>>> eut.parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, -1)
If you want a datetime.datetime
object, you can do:
def my_parsedate(text):
return datetime.datetime(*eut.parsedate(text)[:6])
httplib.HTTPMessage(filehandle).getdate(headername)
httplib.HTTPMessage(filehandle).getdate_tz(headername)
mimetools.Message(filehandle).getdate()
rfc822.parsedate(datestr)
rfc822.parsedate_tz(datestr)
NOTE:
you can do this, if you only have that piece of string and you want to parse it:
>>> from rfc822 import parsedate, parsedate_tz
>>> parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
>>>
but let me exemplify through mime messages:
import mimetools
import StringIO
message = mimetools.Message(
StringIO.StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> m
<mimetools.Message instance at 0x7fc259146710>
>>> m.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
or via http messages (responses)
>>> from httplib import HTTPMessage
>>> from StringIO import StringIO
>>> http_response = HTTPMessage(StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> #http_response can be grabbed via urllib2.urlopen(url).info(), right?
>>> http_response.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
right?
>>> import urllib2
>>> urllib2.urlopen('https://fw.io/').info().getdate('Date')
(2014, 2, 19, 18, 53, 26, 0, 1, 0)
there, now we now more about date formats, mime messages, mime tools and their pythonic implementation ;-)
whatever the case, looks better than using email.utils for parsing http headers.
Since Python 3.3 there's email.utils.parsedate_to_datetime which can parse RFC 5322 timestamps (aka IMF-fixdate
, Internet Message Format fixed length format, a subset of HTTP-date
of RFC 7231).
>>> from email.utils import parsedate_to_datetime
...
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... parsedate_to_datetime(s)
0: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
There's also undocumented http.cookiejar.http2time which can achieve the same as follows:
>>> from datetime import datetime, timezone
... from http.cookiejar import http2time
...
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... datetime.utcfromtimestamp(http2time(s)).replace(tzinfo=timezone.utc)
1: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
It was introduced in Python 2.4 as cookielib.http2time for dealing with Cookie Expires
directive which is expressed in the same format.
>>> import datetime
>>> datetime.datetime.strptime('Wed, 23 Sep 2009 22:15:29 GMT', '%a, %d %b %Y %H:%M:%S GMT')
datetime.datetime(2009, 9, 23, 22, 15, 29)