Scrapy store returned items in variables to use in main script

筅森魡賤 提交于 2019-12-06 13:38:14

using global as you know is not a good style,especially while you need to extend your demand.

My suggestion is to store the title into file or list and use it in your main process,or if you want to handle the title in other script,then just open file and read title in your script

(Note: please ignore the indentation issue)

spider.py

import scrapy
from scrapy.crawler import CrawlerProcess

namefile = 'namefile.txt'
current_title_session = []#title stored in current session
file_append = open(namefile,'a',encoding = 'utf-8')

try:
    title_in_file = open(namefile,'r').readlines()
except:
    title_in_file = open(namefile,'w')

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/page/1/'
    ]

    custom_settings = {
        'LOG_ENABLED': 'False',
    }

    def parse(self, response):
        title = response.css('title::text').extract_first()
        if title +'\n' not in title_in_file  and title not in current_title_session:
             file_append.write(title+'\n')
             current_title_session.append(title)
if __name__=='__main__':
    process = CrawlerProcess({
        'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
    })

    process.crawl(QuotesSpider)
    process.start() # the script will block here until the crawling is finished

making a variable global should work for what you need, but as you mentioned it isn't of good style.

I would actually recommend using a different service for communication between processes, something like Redis, so you won't be having conflicts between your spider and any other process.

It is very simple to setup and use, the documentation has a very simple example.

Instantiate the redis connection inside the spider and again on the main process (think about them as separate processes). The spider sets the variables and the main process reads (or gets) the information.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!