HTTPError: HTTP Error 403: Forbidden

匿名 (未验证) 提交于 2019-12-03 01:59:02

问题:

I making a python script for personal use but it's not working for wikipedia...

This work:

import urllib2, sys from bs4 import BeautifulSoup  site = "http://youtube.com" page = urllib2.urlopen(site) soup = BeautifulSoup(page) print soup 

This not work:

import urllib2, sys from bs4 import BeautifulSoup  site= "http://en.wikipedia.org/wiki/StackOverflow" page = urllib2.urlopen(site) soup = BeautifulSoup(page) print soup 

This is the error:

Traceback (most recent call last):   File "C:\Python27\wiki.py", line 5, in      page = urllib2.urlopen(site)   File "C:\Python27\lib\urllib2.py", line 126, in urlopen     return _opener.open(url, data, timeout)   File "C:\Python27\lib\urllib2.py", line 406, in open     response = meth(req, response)   File "C:\Python27\lib\urllib2.py", line 519, in http_response     'http', request, response, code, msg, hdrs)   File "C:\Python27\lib\urllib2.py", line 444, in error     return self._call_chain(*args)   File "C:\Python27\lib\urllib2.py", line 378, in _call_chain     result = func(*args)   File "C:\Python27\lib\urllib2.py", line 527, in http_error_default     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 403: Forbidden 

回答1:

Within the current code:

Python 2.X

import urllib2, sys from BeautifulSoup import BeautifulSoup  site= "http://en.wikipedia.org/wiki/StackOverflow" hdr = {'User-Agent': 'Mozilla/5.0'} req = urllib2.Request(site,headers=hdr) page = urllib2.urlopen(req) soup = BeautifulSoup(page) print soup 

Python 3.X

from bs4 import BeautifulSoup from urllib.request import Request, urlopen  site= "http://en.wikipedia.org/wiki/StackOverflow" hdr = {'User-Agent': 'Mozilla/5.0'} req = Request(site,headers=hdr) page = urlopen(req) soup = BeautifulSoup(page) print(soup) 

Python 3.X with Selenium (Javascript functions execution)

from selenium import webdriver as driver  browser = driver.PhantomJS() p = browser.get("http://en.wikipedia.org/wiki/StackOverflow") assert "Stack Overflow - Wikipedia" in browser.title 

The reason modified version works is because Wikipedia checks for User-Agent to be of "popular browser"



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!