Scrapy crawler in Cron job

后端 未结 7 1631
遥遥无期
遥遥无期 2020-12-13 15:11

I want to execute my scrapy crawler from cron job .

i create bash file getdata.sh where scrapy project is located with it\'s spiders

#!/bin/bash
cd /         


        
相关标签:
7条回答
  • 2020-12-13 15:28

    Adding the following lines in crontab -e runs my scrapy crawl at 5AM every day. This is a slightly modified version of crocs' answer

    PATH=/usr/bin
    * 5 * * * cd project_folder/project_name/ && scrapy crawl spider_name
    

    Without setting $PATH, cron would give me an error "command not found: scrapy". I guess this is because /usr/bin is where scripts to run programs are stored in Ubuntu.

    Note that the complete path for my scrapy project is /home/user/project_folder/project_name. I ran the env command in cron and noticed that the working directory is /home/user. Hence I skipped /home/user in my crontab above

    The cron log can be helpful while debugging

    grep CRON /var/log/syslog
    
    0 讨论(0)
  • 2020-12-13 15:31

    does your shell script have execute permission?

    e.g. can you do

      /myfolder/crawlers/getdata.sh 
    

    without the sh?

    if you can then you can drop the sh in the line in cron

    0 讨论(0)
  • 2020-12-13 15:32

    I solved this problem including PATH into bash file

    #!/bin/bash
    
    cd /myfolder/crawlers/
    PATH=$PATH:/usr/local/bin
    export PATH
    scrapy crawl my_spider_name
    
    0 讨论(0)
  • 2020-12-13 15:32

    in my case scrapy is in .local/bin/scrapy give the proper path of scraper and name it worK perfect

    0 0 * * * cd /home/user/scraper/Folder_of_scriper/ && /home/user/.local/bin/scrapy crawl "name" >> /home/user/scrapy.log 2>&1

    /home/user/scrapy.log it use to save the output and error in scrapy.log for check it program work or not

    thank you.

    0 讨论(0)
  • 2020-12-13 15:41

    Check where scrapy is installed using "which scrapy" command. In my case, scrapy is installed in /usr/local/bin.

    Open crontab for editing using crontab -e. PATH=$PATH:/usr/local/bin export PATH */5 * * * * cd /myfolder/path && scrapy crawl spider_name

    It should work. Scrapy runs every 5 minutes.

    0 讨论(0)
  • 2020-12-13 15:50

    For anyone who used pip3 (or similar) to install scrapy, here is a simple inline solution:

    */10 * * * * cd ~/project/path && ~/.local/bin/scrapy crawl something >> ~/crawl.log 2>&1
    

    Replace:

    */10 * * * * with your cron pattern

    ~/project/path with the path to your scrapy project (where your scrapy.cfg is)

    something with the spider name (use scrapy list in your project to find out)

    ~/crawl.log with your log file position (in case you want to have logging)

    0 讨论(0)
提交回复
热议问题