Changing user agent on urllib2.urlopen

后端 未结 9 2193
感动是毒
感动是毒 2020-11-22 13:59

How can I download a webpage with a user agent other than the default one on urllib2.urlopen?

9条回答
  •  温柔的废话
    2020-11-22 14:46

    there are two properties of urllib.URLopener() namely:
    addheaders = [('User-Agent', 'Python-urllib/1.17'), ('Accept', '*/*')] and
    version = 'Python-urllib/1.17'.
    To fool the website you need to changes both of these values to an accepted User-Agent. for e.g.
    Chrome browser : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.149 Safari/537.36'
    Google bot : 'Googlebot/2.1'
    like this

    import urllib
    page_extractor=urllib.URLopener()  
    page_extractor.addheaders = [('User-Agent', 'Googlebot/2.1'), ('Accept', '*/*')]  
    page_extractor.version = 'Googlebot/2.1'
    page_extractor.retrieve(, )
    

    changing just one property does not work because the website marks it as a suspicious request.

提交回复
热议问题