Python, beautiful soup, get all class name

梦想的初衷 提交于 2019-12-24 00:19:24

问题


given an html code lets say:

 <div class="class1">
    <span class="class2">some text</span>
    <span class="class3">some text</span>
    <span class="class4">some text</span>
    </div>

How can I retrieve all the class names? ie: ['class1','class2','class3','class4']

I tried:

soup.find_all(class_=True)

But it retrieves the whole tag and i then need to do some regex on the string


回答1:


You can treat each Tag instance found as a dictionary when it comes to retrieving attributes. Note that class attribute value would be a list since class is a special "multi-valued" attribute:

classes = []
for element in soup.find_all(class_=True):
    classes.extend(element["class"])

Or:

classes = [value 
           for element in soup.find_all(class_=True) 
           for value in element["class"]]

Demo:

In [1]: from bs4 import BeautifulSoup

In [2]: data = """
   ...: <div class="class1">
   ...:     <span class="class2">some text</span>
   ...:     <span class="class3">some text</span>
   ...:     <span class="class4">some text</span>
   ...: </div>"""

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: classes = [value
   ...:            for element in soup.find_all(class_=True)
   ...:            for value in element["class"]]

In [5]: print(classes)
['class1', 'class2', 'class3', 'class4']


来源:https://stackoverflow.com/questions/43751699/python-beautiful-soup-get-all-class-name

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!