import requests r=requests.get('http://python123.io/ws/demo.html') demo=r.text 

HTML可以看做一棵标签树
!
| 属性 | 说明 |
|---|---|
| .contents | 将该标签所有的儿子节点存入列表 |
| .children | 子节点的迭代类型,和contents类似,用于遍历儿子节点 |
| .descendants | 子孙节点的迭代类型,包含所有的子孙跌点,用于循环遍历 |
import requests from bs4 import BeautifulSoup r=requests.get('http://python123.io/ws/demo.html') demo=r.text soup=BeautifulSoup(demo,'html.parser') print(soup.contents)# 获取整个标签树的儿子节点 print(soup.body.content)#返回标签树的body标签下的节点 print(soup.head)#返回head标签 print(len(soup.body.content))#输出body标签儿子节点的个数 print(soup.body.content[1])#获取body下第一个子标签 遍历子孙节点
import requests from bs4 import BeautifulSoup r=requests.get('http://python123.io/ws/demo.html') demo=r.text soup=BeautifulSoup(demo,'html.parser') for child in soup.body.children:#遍历儿子节点 print(child) for child in soup.body.descendants:#遍历子孙节点 print(child) | 属性 | 说明 |
|---|---|
| .parent | 节点的父亲标签 |
| .parents | 节点的先辈标签的迭代类型,用于循环遍历先辈节点 |
import requests from bs4 import BeautifulSoup r=requests.get('http://python123.io/ws/demo.html') demo=r.text soup=BeautifulSoup(demo,'html.parser') print(soup.title.parent) print(soup.title.parent) print(soup.parent) import requests from bs4 import BeautifulSoup r=requests.get('http://python123.io/ws/demo.html') demo=r.text soup=BeautifulSoup(demo,'html.parser') for parent in soup.a.parents:#遍历先辈的信息 if parent is None: print(parent) else: print(parent.name) | 属性 | 说明 |
|---|---|
| .next_sibling | 返回HTML文本顺序的下一个平行标签 |
| .previous_sibling | 返回HTML文本顺序的上一个平行标签 |
| .next_siblings | 迭代类型,返回HTML文本顺序后续所有的平行标签 |
| .pervious_siblings | 迭代类型,返回HTML文本顺序前面所有的平行标签 |
注意
- 标签树的平行遍历是有条件的
- 平行遍历发生在同一个父亲节点的各节点之间
- 标签中的内容也构成了节点

import requests from bs4 import BeautifulSoup r=requests.get('http://python123.io/ws/demo.html') demo=r.text soup=BeautifulSoup(demo,'html.parser') print(soup.a.next_sibling)#a标签的下一个标签 print(soup.a.next_sibling.next_sibling)#a标签的下一个标签的下一个标签 print(soup.a.previous_sibling)#a标签的前一个标签 print(soup.a.previous_sibling.previous_sibling)#a标签的前一个标签的前一个标签 平行遍历
import requests from bs4 import BeautifulSoup r=requests.get('http://python123.io/ws/demo.html') demo=r.text soup=BeautifulSoup(demo,'html.parser') for sibling in soup.a.next_siblings:#遍历后续节点 print(sibling) for sibling in soup.a.previous_sibling:#遍历之前的节点 print(sibling) 来源:博客园
作者:梦小冷
链接:https://www.cnblogs.com/mengxiaoleng/p/11585754.html