Python爬虫学习笔记(BeautifulSoup4库:标签树的上、下、平行遍历)
BeautifulSoup4:beautifulsoup库是解析、遍历、维护“标签树”的功能库。安装参考requests库 用法: from bs4 import BeautifulSoup soup = BeautifulSoup(‘<p>data</p>’,’html.parser’) #测试 import requests from bs4 import BeautifulSoup r = requests.get("http://python123.io/ws/demo.html") r.text demo = r.text soup = BeautifulSoup(demo,"html.parser") #对demo进行HTML的解析 Soup2 =BeautifulSoup(open(“D://demo.html”),”html.parser”) #写入文档 print(soup.prettify()) #将Beautiful Soup的文档树格式化后以Unicode编码输出,每个XML/HTML标签都独占一行 基本解析器: bs4的HTML解析器:BeautifulSoup(mk,’html.parser’)(安装bs4) lxml的HTML解析库:BeautifulSoup(mk,’lxml’)(安装lxml) lxml的XML 解析库:BeautifulSoup