问题
As always, I frequently have issues, and I have thoroughly searched for an answer to the current one but find myself at a loss. Here are some of the places I have searched: - How to fix httplib.BadStatusLine exception? - Python httplib2 Handling Exceptions - python http status code
My issue is the following. I have created a spider and want to crawl different urls. When I crawl each url independently everything works fine. However, when I try to crawl both I get the following error: httplib.BadStatusLine: ''
I have followed some advice that I read (see links mentioned above) and can print the response.status for each request works, but the response.url does not print and the error is thrown. (I only print both statements to try to identify the source of the error).
I hope that this is clear.
I am using scrapy and selenium
class PeoplePage(Spider):
name = "peopleProfile"
allowed_domains = ["blah.com"]
handle_httpstatus_list = [200, 404]
start_urls = [
"url1",
"url2"
]
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
print response.status
print '???????????????????????????????????'
if response.status == 200:
self.driver.implicitly_wait(5)
self.driver.get(response.url)
print response.url
print '!!!!!!!!!!!!!!!!!!!!'
# DO STUFF
self.driver.close()
回答1:
Based on Python Doc, httplib.BadStatusLine raised if a server responds with a HTTP status code that we don’t understand.
You can try to pass this exception. You should not close your driver if you are going to call more than one url.
Try this:
def parse(self, response):
try:
print response.status
print '???????????????????????????????????'
if response.status == 200:
self.driver.implicitly_wait(5)
self.driver.get(response.url)
print response.url
print '!!!!!!!!!!!!!!!!!!!!'
# DO STUFF
except httplib.BadStatusLine:
pass
回答2:
I made a decorator to do what the top answer does, so as to make the code easily reusable. Here it is:
import http
def pass_bad_status_line_exc(wrapped_function):
"""
Silently pass this exception `http.client.BadStatusLine` decorator
"""
def _wrapper(*args, **kwargs):
try:
result = wrapped_function(*args, **kwargs)
except http.client.BadStatusLine:
return
return result
return _wrapper
回答3:
I hit this error because I defined a selenium.webdriver instance (named driver), called driver.quit() on it, then tried to call driver.get(url) on the quit driver. The solution is to not call driver.quit().
回答4:
I'm not sure how much this will help, but for me, I was trying to issue a POST request and you need a new HTTP Connection in order to do it. You can't use the same connection for multiple requests. I keep on getting the same error: httplib.BadStatusLine: ''. I believe the documentation outlines this, I just overlooked it.
来源:https://stackoverflow.com/questions/27619258/httplib-badstatusline