I want to execute my scrapy crawler from cron job .
i create bash file getdata.sh where scrapy project is located with it\'s spiders
#!/bin/bash
cd /
Adding the following lines in crontab -e
runs my scrapy crawl at 5AM every day. This is a slightly modified version of crocs' answer
PATH=/usr/bin
* 5 * * * cd project_folder/project_name/ && scrapy crawl spider_name
Without setting $PATH
, cron would give me an error "command not found: scrapy". I guess this is because /usr/bin is where scripts to run programs are stored in Ubuntu.
Note that the complete path for my scrapy project is /home/user/project_folder/project_name
. I ran the env command in cron and noticed that the working directory is /home/user
. Hence I skipped /home/user
in my crontab above
The cron log can be helpful while debugging
grep CRON /var/log/syslog
does your shell script have execute permission?
e.g. can you do
/myfolder/crawlers/getdata.sh
without the sh?
if you can then you can drop the sh in the line in cron
I solved this problem including PATH into bash file
#!/bin/bash
cd /myfolder/crawlers/
PATH=$PATH:/usr/local/bin
export PATH
scrapy crawl my_spider_name
in my case scrapy is in .local/bin/scrapy give the proper path of scraper and name it worK perfect
0 0 * * * cd /home/user/scraper/Folder_of_scriper/ && /home/user/.local/bin/scrapy crawl "name" >> /home/user/scrapy.log 2>&1
/home/user/scrapy.log it use to save the output and error in scrapy.log for check it program work or not
thank you.
Check where scrapy is installed using "which scrapy" command.
In my case, scrapy is installed in /usr/local/bin
.
Open crontab for editing using crontab -e
.
PATH=$PATH:/usr/local/bin
export PATH
*/5 * * * * cd /myfolder/path && scrapy crawl spider_name
It should work. Scrapy runs every 5 minutes.
For anyone who used pip3
(or similar) to install scrapy
, here is a simple inline solution:
*/10 * * * * cd ~/project/path && ~/.local/bin/scrapy crawl something >> ~/crawl.log 2>&1
Replace:
*/10 * * * *
with your cron pattern
~/project/path
with the path to your scrapy project (where your scrapy.cfg
is)
something
with the spider name (use scrapy list
in your project to find out)
~/crawl.log
with your log file position (in case you want to have logging)