Scrapy -- Tutorial
1. 安装 # 会自动解决依赖. $ pip install scrapy 相关依赖的库介绍: lxml : XML 和 HTML 解析器. parsel : 基于 lxml 的 HTML/XML 数据提取器 w3lib : a multi-purpose helper for dealing with URLs and web page encodings. twisted : an asynchronous networking framework. cryptography and pyOpenSSL : to deal with various network-level security needs. 2. Tutorial 2.1. Creating a project $ scrapy startproject tutorial $ tree tutorial/ scrapy.cfg # deploy configuration file tutorial/ # project's Python module, you'll import your code from here __init__.py items.py # project items definition file pipelines.py # project pipelines file