CSS选择器
通过select()直接传入CSS选择器即可完成选择
实例代码如下:
html='''
<div class="panel">
<div class="panel-heading">
<h4>Hello</h4>
</div>
<div class="panel-body">
<ul class="list" id="list-1">
<li class="element">Foo</li>
<li class="element">Bar</li>
<li class="element">Jay</li>
</ul>
<ul class="list list-small" id="list-2">
<li class="element">Foo</li>
<li class="element">Bar</li>
</ul>
</div>
</div>
'''
1.基本语法
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.select('.panel .panel-heading'))#选择class的类型
print(soup.select('ul li'))#直接选择标签
print(soup.select('#list-2 .element'))#选择id的类型
print(type(soup.select('ul')[0]))
输出如下:
[<div class="panel-heading"> <h4>Hello</h4> </div>] [<li class="element">Foo</li>, <li class="element">Bar</li>, <li class="element">Jay</li>, <li class="element">Foo</li>, <li class="element">Bar</li>] [<li class="element">Foo</li>, <li class="element">Bar</li>] <class 'bs4.element.Tag'>
2.层层迭代
#把每一组ul的li输出
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for ul in soup.select('ul'):
print(ul.select('li'))
3,获取属性
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for ul in soup.select('ul'):
print(ul['id'])#这两种方法都能获取标签的属性(id或其他)
print(ul.attrs['id'])
输出如下:
list-1 list-1 list-2 list-2
4,获取内容
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for li in soup.select('li'):
print(li.get_text())#输出li里的内容
输出如下:
Foo Bar Jay Foo Bar
来源:https://www.cnblogs.com/yangshuai2020/p/12335309.html