Why urllib.urlopen.read() does not correspond to source code?

前端 未结 5 673
说谎
说谎 2021-01-11 14:02

I\'m trying to fetch the following webpage:

import urllib
urllib.urlopen(\"http://www.gallimard-jeunesse.fr/searchjeunesse/advanced/(order)/author?catalog[0]         


        
5条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-01-11 14:42

    You can use Selenium with Firefox for solving the issue, but it may not be suitable in many cases as the browser pops up every-time you run the code. Another idea is to use a headless broswer like PhantomJS.

    The best way for this is to use the mechanize library. Install mechanize via pip.

    pip install mechanize
    

    Then you can use the following code:

    import mechanize 
    
    mb = mechanize.Browser()
    mb.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] 
    mb.set_handle_robots(False)
    url = "http://www.gallimard-jeunesse.fr/searchjeunesse/advanced/(order)/author?catalog[0]=1&SearchAction=1"
    response = mb.open(url).read()
    print response
    

    It also provides option for sleep and executing scripts. You can read them in the documentation.

提交回复
热议问题