Using Scrapy to parse sitemaps

半腔热情 提交于 2019-12-06 14:43:49

问题


I want to be able to use scrapy to crawl links on a sitemap. I don't know much about this application, so I would be interested in any links/info/documentation you could provide.

Thanks


回答1:


A new generic spider has just been added to Scrapy trunk, for this purpose. It will be available on next release (Scrapy 0.14)

  • Code here: http://snippets.scrapy.org/snippets/20/
  • Documentation here: http://readthedocs.org/docs/scrapy/en/latest/topics/spiders.html#sitemapspider



回答2:


All of the documentation is at http://doc.scrapy.org/. The tutorials can be found at scrapy.org also.

As for your question, see this SO question: how to parse a sitemap.xml file using scrapy's XmlFeedSpider?



来源:https://stackoverflow.com/questions/6335906/using-scrapy-to-parse-sitemaps

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!