How to use PyCharm to debug Scrapy projects

后端 未结 10 1400
长发绾君心
长发绾君心 2020-12-02 04:00

I am working on Scrapy 0.20 with Python 2.7. I found PyCharm has a good Python debugger. I want to test my Scrapy spiders using it. Anyone knows how to do that please?

<
相关标签:
10条回答
  • 2020-12-02 04:17

    To add a bit to the accepted answer, after almost an hour I found I had to select the correct Run Configuration from the dropdown list (near the center of the icon toolbar), then click the Debug button in order to get it to work. Hope this helps!

    0 讨论(0)
  • 2020-12-02 04:19

    Extending @Rodrigo's version of the answer I added this script and now I can set spider name from configuration instead of changing in the string.

    import sys
    from scrapy import cmdline
    
    cmdline.execute(f"scrapy crawl {sys.argv[1]}".split())
    
    0 讨论(0)
  • 2020-12-02 04:24

    intellij idea also work.

    create main.py:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    #coding=utf-8
    import sys
    from scrapy import cmdline
    def main(name):
        if name:
            cmdline.execute(name.split())
    
    
    
    if __name__ == '__main__':
        print('[*] beginning main thread')
        name = "scrapy crawl stack"
        #name = "scrapy crawl spa"
        main(name)
        print('[*] main thread exited')
        print('main stop====================================================')
    

    show below:

    0 讨论(0)
  • 2020-12-02 04:24

    According to the documentation https://doc.scrapy.org/en/latest/topics/practices.html

    import scrapy
    from scrapy.crawler import CrawlerProcess
    
    class MySpider(scrapy.Spider):
        # Your spider definition
        ...
    
    process = CrawlerProcess({
        'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
    })
    
    process.crawl(MySpider)
    process.start() # the script will block here until the crawling is finished
    
    0 讨论(0)
  • 2020-12-02 04:28

    The scrapy command is a python script which means you can start it from inside PyCharm.

    When you examine the scrapy binary (which scrapy) you will notice that this is actually a python script:

    #!/usr/bin/python
    
    from scrapy.cmdline import execute
    execute()
    

    This means that a command like scrapy crawl IcecatCrawler can also be executed like this: python /Library/Python/2.7/site-packages/scrapy/cmdline.py crawl IcecatCrawler

    Try to find the scrapy.cmdline package. In my case the location was here: /Library/Python/2.7/site-packages/scrapy/cmdline.py

    Create a run/debug configuration inside PyCharm with that script as script. Fill the script parameters with the scrapy command and spider. In this case crawl IcecatCrawler.

    Like this: PyCharm Run/Debug Configuration

    Put your breakpoints anywhere in your crawling code and it should work™.

    0 讨论(0)
  • 2020-12-02 04:32

    I am also using PyCharm, but I am not using its built-in debugging features.

    For debugging I am using ipdb. I set up a keyboard shortcut to insert import ipdb; ipdb.set_trace() on any line I want the break point to happen.

    Then I can type n to execute the next statement, s to step into a function, type any object name to see its value, alter execution environment, type c to continue execution...

    This is very flexible, works in environments other than PyCharm, where you don't control the execution environment.

    Just type in your virtual environment pip install ipdb and place import ipdb; ipdb.set_trace() on a line where you want the execution to pause.

    UPDATE

    You can also pip install pdbpp and use the standard import pdb; pdb.set_trace instead of ipdb. PDB++ is nicer in my opinion.

    0 讨论(0)
提交回复
热议问题