Using one Scrapy spider for several websites

后端未结

关注

 4  432

南方客 2020-12-08 11:47

I need to create a user configurable web spider/crawler, and I\'m thinking about using Scrapy. But, I can\'t hard-code the domains and allowed URL regex:es -- this will inst

4条回答

时光取名叫无心 (楼主)

2020-12-08 12:19
Now it is extremely easy to configure scrapy for these purposes:
1. About the first urls to visit, you can pass it as an attribute on the spider call with -a, and use the start_requests function to setup how to start the spider
2. You don't need to setup the allowed_domains variable for the spiders. If you don't include that class variable, the spider will be able to allow every domain.
It should end up to something like:
```
class MySpider(Spider):

    name = "myspider"

    def start_requests(self):
        yield Request(self.start_url, callback=self.parse)


    def parse(self, response):
        ...
```
and you should call it with:
```
scrapy crawl myspider -a start_url="http://example.com"
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...