Django custom management command running Scrapy: How to include Scrapy's options?

前端 未结 2 524
时光取名叫无心
时光取名叫无心 2021-02-08 03:41

I want to be able to run the Scrapy web crawling framework from within Django. Scrapy itself only provides a command line tool scrapy to execute its commands, i.e.

2条回答
  •  北恋
    北恋 (楼主)
    2021-02-08 04:10

    Okay, I have found a solution to my problem. It's a bit ugly but it works. Since the Django project's manage.py command does not accept Scrapy's command line options, I split the options string into two arguments which are accepted by manage.py. After successful parsing, I rejoin the two arguments and pass them to Scrapy.

    That is, instead of writing

    python manage.py scrapy crawl domain.com -o scraped_data.json -t json
    

    I put spaces in between the options like this

    python manage.py scrapy crawl domain.com - o scraped_data.json - t json
    

    My handle function looks like this:

    def handle(self, *args, **options):
        arguments = self._argv[1:]
        for arg in arguments:
            if arg in ('-', '--'):
                i = arguments.index(arg)
                new_arg = ''.join((arguments[i], arguments[i+1]))
                del arguments[i:i+2]
                arguments.insert(i, new_arg)
    
        from scrapy.cmdline import execute
        execute(arguments)
    

    Meanwhile, Mikhail Korobov has provided the optimal solution. See here:

    # -*- coding: utf-8 -*- 
    # myapp/management/commands/scrapy.py 
    
    from __future__ import absolute_import
    from django.core.management.base import BaseCommand
    
    class Command(BaseCommand):
    
        def run_from_argv(self, argv):
            self._argv = argv
            self.execute()
    
        def handle(self, *args, **options):
            from scrapy.cmdline import execute
            execute(self._argv[1:])
    

提交回复
热议问题