How does Python's Requests treat multiple cookies in a header

陌路散爱 提交于 2021-02-08 04:51:33

问题


I use Python Rquests to extract full headers of responses.

I want to accurately count how many cookies (i.e. nam/variable) pairs in a response. There are two issues:

1) If a server responded with multiple Set-Cookie headers. How does Requests represent this? Does it combine both Set-Cookie values in one? Or leave it as is?

Here is my script to print headers (full header):

import requests
requests.packages.urllib3.disable_warnings() # to disable certificate warnings

response = requests.get("https://example.com",verify=False,timeout=3)
print(str(response.headers))
response_headers = response.headers.get('Set-Cookie')

But when I look at some Set-Cookie response headers I found some name/value pairs are separated by comma like this:

dnn_IsMobile=False; path=/; secure; HttpOnly, Analytics_VisitorId=aa; expires=Mon 19-Aug-2019 14:20:02 GMT; path=/; secure; HttpOnly, Analytics=SessionId=vv&ContentItemId=-1; expires=Sat 20-Jul-2019 15:20:02 GMT; path=/; secure

2) Does this mean the server sent multiple Set-Cookie and Requests combined them?

If requests adds the comma between the name/value pairs of the cookies, does it always separate them with a comma followed by a space? i.e. cookie1=value, cookie2=value and not just a comma like cookie1=value,cookie2=value.

Understanding this difference is very important to me to be able to count the right number of cookies received.


回答1:


How to count the number of cookies and fetching them

You can use the higher level .cookies to get them, instead of using .headers.

For example:

>>> url="https://github.com"
>>> r = requests.get(url)
>>> r.cookies
<RequestsCookieJar[Cookie(version=0, name='_octo', value='GH1.1.1081626831.1563694143', port=None, port_specified=False, domain='.github.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=1626852543, discard=False, comment=None, comment_url=None, rest={}, rfc2109=False), Cookie(version=0, name='logged_in', value='no', port=None, port_specified=False, domain='.github.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=True, expires=2194846143, discard=False, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='_gh_sess', value='N0NVdFd3dTMzcm9GSkh1U21ZQkVaYWUvWnBnRmVic0VFWm9kSVZKVVhMV0hVdUw4cDh5cGpmTmIrQ0xJYU9tNHE0ZHQxVkZlUU9JRGJHUkJtc21yVGM0Mk9hQjBUYnhDVXJYSFVWSjNzT2ZpNjdEVzF0emZydkJmQmgvZmVRRFhEaE1CRTlnd0ZPY0RRY0Z4L1ByaFFpbWhVTGtPZTZmUHhONzBxclIrWWZSdFlZK09NN1QzS1dlL3cwWmVSdG5wTHFROTh1Zmh6Y3JkMjFDQmtxb2FHQT09LS1DUEd6UHFtWS9ubTdpOEdwYndzU3l3PT0%3D--2f3ae9c74cba34f2e8de6dfe55c3616e8a35ab20', port=None, port_specified=False, domain='github.com', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=True, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='has_recent_activity', value='1', port=None, port_specified=False, domain='github.com', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=1563697743, discard=False, comment=None, comment_url=None, rest={}, rfc2109=False)]>
>>> len(r.cookies)
4
>>> r.cookies.keys()
['_octo', 'logged_in', '_gh_sess', 'has_recent_activity']
>>> for key in r.cookies.iterkeys(): print("{}: {}".format(key, r.cookies[key]))
... 
_octo: GH1.1.1081626831.1563694143
logged_in: no
_gh_sess: N0NVdFd3dTMzcm9GSkh1U21ZQkVaYWUvWnBnRmVic0VFWm9kSVZKVVhMV0hVdUw4cDh5cGpmTmIrQ0xJYU9tNHE0ZHQxVkZlUU9JRGJHUkJtc21yVGM0Mk9hQjBUYnhDVXJYSFVWSjNzT2ZpNjdEVzF0emZydkJmQmgvZmVRRFhEaE1CRTlnd0ZPY0RRY0Z4L1ByaFFpbWhVTGtPZTZmUHhONzBxclIrWWZSdFlZK09NN1QzS1dlL3cwWmVSdG5wTHFROTh1Zmh6Y3JkMjFDQmtxb2FHQT09LS1DUEd6UHFtWS9ubTdpOEdwYndzU3l3PT0%3D--2f3ae9c74cba34f2e8de6dfe55c3616e8a35ab20
has_recent_activity: 1

P.S. Sometimes it's easier to read the source code, I found that by reading cookies.py :)


Edit about the delimiter(whether ", " or ",") in r.headers.get("Set-Cookie"):

  1. Requests uses urllib3 under the hood, you will find that r.raw is an object of urllib3.response.HTTPResponse.
  2. In urllib3, headers are represented by HTTPHeaderDict defined in _collections.py, and multiple values are joined by ", " there.

    def __getitem__(self, key):
        val = self._container[key.lower()]
        return ", ".join(val[1:])
    
  3. Also, there an issue about this in urllib3, and a test case for it.

So, you can use ", " to count the number of cookies.

Does requests combine multiple Set-Cookies into one in headers?

I'm afraid the answer is yes, as by examining its value (some unrelevant headers are removed for better reading):

>>> r.headers
{
    'Date': 'Sun, 21 Jul 2019 07:29:03 GMT',
    'Content-Type': 'text/html; charset=utf-8',
    'Transfer-Encoding': 'chunked',
    'Server': 'GitHub.com',
    'Status': '200 OK',
    'Set-Cookie': 'has_recent_activity=1; path=/; expires=Sun, 21 Jul 2019 08:29:03 -0000, _octo=GH1.1.1081626831.1563694143; domain=.github.com; path=/; expires=Wed, 21 Jul 2021 07:29:03 -0000, logged_in=no; domain=.github.com; path=/; expires=Thu, 21 Jul 2039 07:29:03 -0000; secure; HttpOnly, _gh_sess=N0NVdFd3dTMzcm9GSkh1U21ZQkVaYWUvWnBnRmVic0VFWm9kSVZKVVhMV0hVdUw4cDh5cGpmTmIrQ0xJYU9tNHE0ZHQxVkZlUU9JRGJHUkJtc21yVGM0Mk9hQjBUYnhDVXJYSFVWSjNzT2ZpNjdEVzF0emZydkJmQmgvZmVRRFhEaE1CRTlnd0ZPY0RRY0Z4L1ByaFFpbWhVTGtPZTZmUHhONzBxclIrWWZSdFlZK09NN1QzS1dlL3cwWmVSdG5wTHFROTh1Zmh6Y3JkMjFDQmtxb2FHQT09LS1DUEd6UHFtWS9ubTdpOEdwYndzU3l3PT0%3D--2f3ae9c74cba34f2e8de6dfe55c3616e8a35ab20; path=/; secure; HttpOnly',
    'Content-Encoding': 'gzip',
    'X-GitHub-Request-Id': 'A947:3711:E0377A:13B4CEA:5D34143E'
}


来源:https://stackoverflow.com/questions/57125660/how-does-pythons-requests-treat-multiple-cookies-in-a-header

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!