pyspider框架的基本使用

本文只对pyspider的简单使用作一介绍，有关pyspider的详细使用请看：点击打开链接。

1.安装：pip install pyspider

验证安装：pyspider all,安装完成之后，控制台会有如下的输出：

上面的命令意思就是启动pyspider的所有组件，可以看到最后一行输出的是webui界面在5000端口运行的意思，这时我们打开本地的5000端口（http://localhost:5000），看到的就是pyspider的webui界面，如图：

2.点击create,创建一个新的项目，名字随便取，开始的链接写你要抓取的网页的链接,完成之后进入如图的页面：

from pyspider.libs.base_handler import *   class Handler(BaseHandler):     crawl_config = {     }      @every(minutes=24 * 60)     def on_start(self):         self.crawl('http://www.baidu.com', callback=self.index_page)      @config(age=10 * 24 * 60 * 60)     def index_page(self, response):         for each in response.doc('a[href^="http"]').items():             self.crawl(each.attr.href, callback=self.detail_page)      @config(priority=2)     def detail_page(self, response):         return {             "url": response.url,             "title": response.doc('title').text(),         }

3.分析代码逻辑：

文章来源: pyspider框架的基本使用

标签

pyspider

response

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!