python - HTTP Error 503 Service Unavailable

问题

I am trying to scrape data from google and linkedin. Somehow it gave me this error:

*** httperror_seek_wrapper: HTTP Error 503: Service Unavailable

Can someone help advice how I solve this?

回答1:

Google is simply detecting your query as automated. You would need a captcha solver to get unlimited results. The following link might be helpful.

https://support.google.com/websearch/answer/86640?hl=en

Bypassing Captcha using an OCR Engine:

http://www.debasish.in/2012/01/bypass-captcha-using-python-and.html

Simple Approach:

An even simpler approach is to simply use sleep() a few times and to generate random queries. This way google will not spot that you are using an automated system. But the system is far slower ...

Error Handling:

To simply get remove the error message use try and except

回答2:

I encountered the same situation and tried using the sleep() function before every request to spread the requests a little. It looked like it was working fine but failed soon enough even with a delay of 2 seconds. What solved it finally was using:

with contextlib.closing(urllib.urlopen(urlToOpen)) as x:
    #do stuff with x.

This I did because I thought opening too many requests keeps it open and had to closed. Nevertheless, it worked quite consistently with as less as 0.5s delay time.

来源：https://stackoverflow.com/questions/25344610/python-http-error-503-service-unavailable

标签

error-handling

web-scraping

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!