Scrapy Unit Testing

前端 未结 10 1501
逝去的感伤
逝去的感伤 2020-11-30 18:18

I\'d like to implement some unit tests in a Scrapy (screen scraper/web crawler). Since a project is run through the \"scrapy crawl\" command I can run it through something

10条回答
  •  旧巷少年郎
    2020-11-30 18:45

    The way I've done it is create fake responses, this way you can test the parse function offline. But you get the real situation by using real HTML.

    A problem with this approach is that your local HTML file may not reflect the latest state online. So if the HTML changes online you may have a big bug, but your test cases will still pass. So it may not be the best way to test this way.

    My current workflow is, whenever there is an error I will sent an email to admin, with the url. Then for that specific error I create a html file with the content which is causing the error. Then I create a unittest for it.

    This is the code I use to create sample Scrapy http responses for testing from an local html file:

    # scrapyproject/tests/responses/__init__.py
    
    import os
    
    from scrapy.http import Response, Request
    
    def fake_response_from_file(file_name, url=None):
        """
        Create a Scrapy fake HTTP response from a HTML file
        @param file_name: The relative filename from the responses directory,
                          but absolute paths are also accepted.
        @param url: The URL of the response.
        returns: A scrapy HTTP response which can be used for unittesting.
        """
        if not url:
            url = 'http://www.example.com'
    
        request = Request(url=url)
        if not file_name[0] == '/':
            responses_dir = os.path.dirname(os.path.realpath(__file__))
            file_path = os.path.join(responses_dir, file_name)
        else:
            file_path = file_name
    
        file_content = open(file_path, 'r').read()
    
        response = Response(url=url,
            request=request,
            body=file_content)
        response.encoding = 'utf-8'
        return response
    

    The sample html file is located in scrapyproject/tests/responses/osdir/sample.html

    Then the testcase could look as follows: The test case location is scrapyproject/tests/test_osdir.py

    import unittest
    from scrapyproject.spiders import osdir_spider
    from responses import fake_response_from_file
    
    class OsdirSpiderTest(unittest.TestCase):
    
        def setUp(self):
            self.spider = osdir_spider.DirectorySpider()
    
        def _test_item_results(self, results, expected_length):
            count = 0
            permalinks = set()
            for item in results:
                self.assertIsNotNone(item['content'])
                self.assertIsNotNone(item['title'])
            self.assertEqual(count, expected_length)
    
        def test_parse(self):
            results = self.spider.parse(fake_response_from_file('osdir/sample.html'))
            self._test_item_results(results, 10)
    

    That's basically how I test my parsing methods, but its not only for parsing methods. If it gets more complex I suggest looking at Mox

提交回复
热议问题