How do I parse an HTTP date-string in Python?

前端 未结 4 1143
眼角桃花
眼角桃花 2020-12-08 02:34

Is there an easy way to parse HTTP date-strings in Python? According to the standard, there are several ways to format HTTP date strings; the method should be able to handle

相关标签:
4条回答
  • 2020-12-08 02:45
    >>> import email.utils as eut
    >>> eut.parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
    (2009, 9, 23, 22, 15, 29, 0, 1, -1)
    

    If you want a datetime.datetime object, you can do:

    def my_parsedate(text):
        return datetime.datetime(*eut.parsedate(text)[:6])
    
    0 讨论(0)
  • 2020-12-08 02:45
    httplib.HTTPMessage(filehandle).getdate(headername)
    httplib.HTTPMessage(filehandle).getdate_tz(headername)
    mimetools.Message(filehandle).getdate()
    rfc822.parsedate(datestr)
    rfc822.parsedate_tz(datestr)
    
    • if you have a raw data stream, you can build an HTTPMessage or a mimetools.Message from it. it may offer additional help while querying the response object for infos
    • if you are using urllib2, you already have an HTTPMessage object hidden in the filehandler returned by urlopen
    • it can probably parse many date formats
    • httplib is in the core

    NOTE:

    • had a look at implementation, HTTPMessage inherits from mimetools.Message which inherits from rfc822.Message. two floating defs are of your interest maybe, parsedate and parsedate_tz (in the latter)
    • parsedate(_tz) from email.utils has a different implementation, although it looks kind of the same.

    you can do this, if you only have that piece of string and you want to parse it:

    >>> from rfc822 import parsedate, parsedate_tz
    >>> parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
    (2009, 9, 23, 22, 15, 29, 0, 1, 0)
    >>> 
    

    but let me exemplify through mime messages:

    import mimetools
    import StringIO
    message = mimetools.Message(
        StringIO.StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
    >>> m
    <mimetools.Message instance at 0x7fc259146710>
    >>> m.getdate('Date')
    (2009, 9, 23, 22, 15, 29, 0, 1, 0)
    

    or via http messages (responses)

    >>> from httplib import HTTPMessage
    >>> from StringIO import StringIO
    >>> http_response = HTTPMessage(StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
    >>> #http_response can be grabbed via urllib2.urlopen(url).info(), right?
    >>> http_response.getdate('Date')
    (2009, 9, 23, 22, 15, 29, 0, 1, 0)
    

    right?

    >>> import urllib2
    >>> urllib2.urlopen('https://fw.io/').info().getdate('Date')
    (2014, 2, 19, 18, 53, 26, 0, 1, 0)
    

    there, now we now more about date formats, mime messages, mime tools and their pythonic implementation ;-)

    whatever the case, looks better than using email.utils for parsing http headers.

    0 讨论(0)
  • 2020-12-08 02:48

    Since Python 3.3 there's email.utils.parsedate_to_datetime which can parse RFC 5322 timestamps (aka IMF-fixdate, Internet Message Format fixed length format, a subset of HTTP-date of RFC 7231).

    >>> from email.utils import parsedate_to_datetime
    ... 
    ... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
    ... parsedate_to_datetime(s)
    0: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
    

    There's also undocumented http.cookiejar.http2time which can achieve the same as follows:

    >>> from datetime import datetime, timezone
    ... from http.cookiejar import http2time
    ... 
    ... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
    ... datetime.utcfromtimestamp(http2time(s)).replace(tzinfo=timezone.utc)
    1: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
    

    It was introduced in Python 2.4 as cookielib.http2time for dealing with Cookie Expires directive which is expressed in the same format.

    0 讨论(0)
  • 2020-12-08 03:03
    >>> import datetime
    >>> datetime.datetime.strptime('Wed, 23 Sep 2009 22:15:29 GMT', '%a, %d %b %Y %H:%M:%S GMT')
    datetime.datetime(2009, 9, 23, 22, 15, 29)
    
    0 讨论(0)
提交回复
热议问题