Scrape a web page that requires they give you a session cookie first

后端 未结 2 630
故里飘歌
故里飘歌 2020-12-13 22:35

I\'m trying to scrape an excel file from a government \"muster roll\" database. However, the URL I have to access this excel file:

http://nrega.ap.gov.in/Nregs/Front

相关标签:
2条回答
  • 2020-12-13 23:20

    Using requests this is a trivial task:

    >>> url = 'http://httpbin.org/cookies/set/requests-is/awesome'
    >>> r = requests.get(url)
    
    >>> print r.cookies
    {'requests-is': 'awesome'}
    
    0 讨论(0)
  • 2020-12-13 23:29

    Using cookies and urllib2:

    import cookielib
    import urllib2
    
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    # use opener to open different urls
    

    You can use the same opener for several connections:

    data = [opener.open(url).read() for url in urls]
    

    Or install it globally:

    urllib2.install_opener(opener)
    

    In the latter case the rest of the code looks the same with or without cookies support:

    data = [urllib2.urlopen(url).read() for url in urls]
    
    0 讨论(0)
提交回复
热议问题