问题
I am using scrapy 0.20 with puthon 2.7
i used to do this in cmd
-s JOBDIR=crawls/somespider-1
to handle the dublicated items. note please, i already did the changes in setting
I dont' want to use that in cmd.
is there anyway so i can type it in code inside my spider?
thanks
回答1:
It's so easy. Use dropitem in pipelines.py to drop the item. And you can use custom command to code the parameter inside of program.
Here is example of custom code in scrapy
Using the custom command (say : scrapy crawl mycommand
)
you can run -s JOBDIR=crawls/somespider-1
Example:
Create a directory commands
where you have scrapy.cfg
file
Inside the directory create a file mycommand.py
from scrapy.command import ScrapyCommand
from scrapy.cmdline import execute
class Command(ScrapyCommand):
requires_project = True
def short_desc(self):
return "This is your custom command"
def run(self, args, opts):
args.append('scrapy')
args.append('crawl')
args.append('spider')##add what ever your syntax needs.In my case i want to get "scrapy crawl spider" in cmd
execute(args)#send a list as parameter with command as a single element of it
Now go to cmd line and type scrapy mycommand
. Then your magic is ready :-)
来源:https://stackoverflow.com/questions/22131980/python-scrapy-how-to-code-the-parameter-instead-of-using-cmd-use-custom-code-in