How to find tags with only certain attributes - BeautifulSoup

后端 未结 6 791
被撕碎了的回忆
被撕碎了的回忆 2020-11-28 05:23

How would I, using BeautifulSoup, search for tags containing ONLY the attributes I search for?

For example, I want to find all

相关标签:
6条回答
  • 2020-11-28 05:36

    Adding a combination of Chris Redford's and Amr's answer, you can also search for an attribute name with any value with the select command:

    from bs4 import BeautifulSoup as Soup
    html = '<td valign="top">.....</td>\
        <td width="580" valign="top">.......</td>\
        <td>.....</td>'
    soup = Soup(html, 'lxml')
    results = soup.select('td[valign]')
    
    0 讨论(0)
  • 2020-11-28 05:37

    As explained on the BeautifulSoup documentation

    You may use this :

    soup = BeautifulSoup(html)
    results = soup.findAll("td", {"valign" : "top"})
    

    EDIT :

    To return tags that have only the valign="top" attribute, you can check for the length of the tag attrs property :

    from BeautifulSoup import BeautifulSoup
    
    html = '<td valign="top">.....</td>\
            <td width="580" valign="top">.......</td>\
            <td>.....</td>'
    
    soup = BeautifulSoup(html)
    results = soup.findAll("td", {"valign" : "top"})
    
    for result in results :
        if len(result.attrs) == 1 :
            print result
    

    That returns :

    <td valign="top">.....</td>
    
    0 讨论(0)
  • 2020-11-28 05:37

    Just pass it as an argument of findAll:

    >>> from BeautifulSoup import BeautifulSoup
    >>> soup = BeautifulSoup("""
    ... <html>
    ... <head><title>My Title!</title></head>
    ... <body><table>
    ... <tr><td>First!</td>
    ... <td valign="top">Second!</td></tr>
    ... </table></body><html>
    ... """)
    >>>
    >>> soup.findAll('td')
    [<td>First!</td>, <td valign="top">Second!</td>]
    >>>
    >>> soup.findAll('td', valign='top')
    [<td valign="top">Second!</td>]
    
    0 讨论(0)
  • 2020-11-28 05:38

    The easiest way to do this is with the new CSS style select method:

    soup = BeautifulSoup(html)
    results = soup.select('td[valign="top"]')
    
    0 讨论(0)
  • 2020-11-28 05:43

    if you want to only search with attribute name with any value

    from bs4 import BeautifulSoup
    import re
    
    soup= BeautifulSoup(html.text,'lxml')
    results = soup.findAll("td", {"valign" : re.compile(r".*")})
    

    as per Steve Lorimer better to pass True instead of regex

    results = soup.findAll("td", {"valign" : True})
    
    0 讨论(0)
  • 2020-11-28 05:49

    You can use lambda functions in findAll as explained in documentation. So that in your case to search for td tag with only valign = "top" use following:

    td_tag_list = soup.findAll(
                    lambda tag:tag.name == "td" and
                    len(tag.attrs) == 1 and
                    tag["valign"] == "top")
    
    0 讨论(0)
提交回复
热议问题