问题
I want to ask a question
When I use css selector,extract() will make the output thing a list
So if the css selector didn't have value
It will show error in terminal(like below),and the spider won't get any item in my json file
item['intro'] = intro[0]
exceptions.IndexError: list index out of range
So I use try and except to check the list is exists
sel = Selector(response)
sites = sel.css("div.con ul > li")
for site in sites:
item = Shopping_appleItem()
links = site.css(" a::attr(href)").extract()
title = site.css(" a::text").extract()
date = site.css(" time::text").extract()
try:
item['link'] = urlparse.urljoin(response.url,links[0])
except:
print "link not found"
try:
item['title'] = title[0]
except:
print "title not found"
try:
item['date'] = date[0]
except:
print "date not found"
I feel I use a lot of try and except,and I don't know if it is a good way.
Please guide me a bit Thank you
回答1:
You can use a separate function for extraction of data. e.g for text nodes, sample code is here
def extract_text(node):
if not node:
return ''
_text = './/text()'
extracted_list = [x.strip() for x in node.xpath(_text).extract() if len(x.strip()) > 0]
if not extracted_list:
return ''
return ' '.join(extracted_list)
and you can call this method like this
self.extract_text(sel.css("your_path"))
来源:https://stackoverflow.com/questions/25521303/scrapy-another-method-to-avoid-a-lot-of-try-except