Access session cookie in scrapy spiders

前端 未结 3 873
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-28 17:07

I am trying to access the session cookie within a spider. I first login to a social network using in a spider:

    def parse(self, response):

        retur         


        
相关标签:
3条回答
  • 2020-12-28 17:38

    Maybe this is an overkill, but i don't know how are you going to use those cookies, so it might be useful (an excerpt from real code - adapt it to your case):

    from scrapy.http.cookies import CookieJar
    
    class MySpider(BaseSpider):
    
        def parse(self, response):
    
            cookieJar = response.meta.setdefault('cookie_jar', CookieJar())
            cookieJar.extract_cookies(response, response.request)
            request = Request(nextPageLink, callback = self.parse2,
                          meta = {'dont_merge_cookies': True, 'cookie_jar': cookieJar})
            cookieJar.add_cookie_header(request) # apply Set-Cookie ourselves
    

    CookieJar has some useful methods.

    If you still don't see the cookies - maybe they are not there?


    UPDATE:

    Looking at CookiesMiddleware code:

    class CookiesMiddleware(object):
        def _debug_cookie(self, request, spider):
            if self.debug:
                cl = request.headers.getlist('Cookie')
                if cl:
                    msg = "Sending cookies to: %s" % request + os.linesep
                    msg += os.linesep.join("Cookie: %s" % c for c in cl)
                    log.msg(msg, spider=spider, level=log.DEBUG)
    

    So, try request.headers.getlist('Cookie')

    0 讨论(0)
  • 2020-12-28 17:49

    This works for me

    response.request.headers.get('Cookie')
    

    It seems to return all the cookies that where introduced by the middleware in the request, session's or otherwise.

    0 讨论(0)
  • 2020-12-28 17:52

    A classic example is having a login server, which provides a new session id after a successful login. This new session id should be used with another request.

    Here is the code picked up from source which seems to work for me.

    print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
    

    Code:

    def check_logged(self, response):
    tmpCookie = response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
    print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
    cookieHolder=dict(SESSION_ID=tmpCookie)
    
    #print response.body
    if "my name" in response.body:
        yield Request(url="<<new url for another server>>",   
            cookies=cookieHolder,
            callback=self."<<another function here>>")
    else:
        print "login failed"
            return 
    
    0 讨论(0)
提交回复
热议问题