Scrape a web page that requires they give you a session cookie first

后端 未结 2 635
故里飘歌
故里飘歌 2020-12-13 22:35

I\'m trying to scrape an excel file from a government \"muster roll\" database. However, the URL I have to access this excel file:

http://nrega.ap.gov.in/Nregs/Front

2条回答
  •  一个人的身影
    2020-12-13 23:29

    Using cookies and urllib2:

    import cookielib
    import urllib2
    
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    # use opener to open different urls
    

    You can use the same opener for several connections:

    data = [opener.open(url).read() for url in urls]
    

    Or install it globally:

    urllib2.install_opener(opener)
    

    In the latter case the rest of the code looks the same with or without cookies support:

    data = [urllib2.urlopen(url).read() for url in urls]
    

提交回复
热议问题