scrapy : another method to avoid a lot of try except

自古美人都是妖i 提交于 2019-12-25 03:54:20

问题


I want to ask a question
When I use css selector,extract() will make the output thing a list
So if the css selector didn't have value
It will show error in terminal(like below),and the spider won't get any item in my json file

item['intro'] = intro[0]
exceptions.IndexError: list index out of range

So I use try and except to check the list is exists

    sel = Selector(response)
    sites = sel.css("div.con ul > li")
    for site in sites:
        item = Shopping_appleItem()
        links = site.css("  a::attr(href)").extract()
        title = site.css("  a::text").extract()
        date = site.css(" time::text").extract()

        try:
            item['link']  = urlparse.urljoin(response.url,links[0])
        except:
            print "link not found" 
        try:
            item['title'] = title[0]       
        except:
            print "title not found" 
        try:
            item['date'] = date[0]       
        except:
            print "date not found" 

I feel I use a lot of try and except,and I don't know if it is a good way.
Please guide me a bit Thank you


回答1:


You can use a separate function for extraction of data. e.g for text nodes, sample code is here

    def extract_text(node):
        if not node:
            return ''
        _text = './/text()'
        extracted_list = [x.strip() for x in node.xpath(_text).extract() if len(x.strip()) > 0]
        if not extracted_list:
            return ''
        return ' '.join(extracted_list)

and you can call this method like this

    self.extract_text(sel.css("your_path"))


来源:https://stackoverflow.com/questions/25521303/scrapy-another-method-to-avoid-a-lot-of-try-except

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!