Changing User Agent in Python 3 for urrlib.request.urlopen

前端 未结 4 1024
你的背包
你的背包 2020-11-28 07:39

I want to open a url using urllib.request.urlopen(\'someurl\'):

with urllib.request.urlopen(\'someurl\') as url:
b = url.read()
<
4条回答
  •  夕颜
    夕颜 (楼主)
    2020-11-28 08:17

    The host site rejection is coming from the OWASP ModSecurity Core Rules for Apache mod-security. Rule 900002 has a list of "bad" user agents, and one of them is "python-urllib2". That's why requests with the default user agent fail.

    Unfortunately, if you use Python's "robotparser" function,

    https://docs.python.org/3.5/library/urllib.robotparser.html?highlight=robotparser#module-urllib.robotparser

    it uses the default Python user agent, and there's no parameter to change that. If "robotparser"'s attempt to read "robots.txt" is refused (not just URL not found), it then treats all URLs from that site as disallowed.

提交回复
热议问题