Complex Beautiful Soup query

后端 未结 3 852
盖世英雄少女心
盖世英雄少女心 2020-12-17 06:25

Here is a snippet of an HTML file I\'m exploring with Beautiful Soup.


    

        
相关标签:
3条回答
  • 2020-12-17 06:37
    >>> BeautifulSoup.BeautifulSoup("""<html><td width="50%">
    ...     <strong class="sans"><a href="http:/website">Site</a></strong> <br />
    ... </html>""" )
    <html><td width="50%">
    <strong class="sans"><a href="http:/website">Site</a></strong> <br />
    </td></html>
    >>> [ a for a in strong.findAll("a") 
                for strong in tr.findAll("strong", attrs = {"class": "sans"}) 
                    for tr in soup.findAll("td", width = "50%")]
    [<a href="http:/website">Site</a>]
    
    0 讨论(0)
  • from bs4 import BeautifulSoup
    html_doc = """<td width="50%">
    <strong class="sans"><a href="http:/website">Site</a></strong> <br /> 
    """
    soup = BeautifulSoup(html_doc, 'html.parser')
    soup.select('td[width="50%"] .sans [href]')
    # Out[24]: [<a href="http:/website">Site</a>]
    

    Documentation

    0 讨论(0)
  • 2020-12-17 06:50

    BeautifulSoup's search mechanisms accept a callable, which the docs appear to recommend for your case: "If you need to impose complex or interlocking restrictions on a tag's attributes, pass in a callable object for name,...". (ok... they're talking about attributes specifically, but the advice reflects an underlying spirit to the BeautifulSoup API).

    If you want a one-liner:

    soup.findAll(lambda tag: tag.name == 'a' and \
    tag.findParent('strong', 'sans') and \
    tag.findParent('strong', 'sans').findParent('td', attrs={'width':'50%'}))
    

    I've used a lambda in this example, but in practice you may want to define a callable function if you have multiple chained requirements as this lambda has to make two findParent('strong', 'sans') calls to avoid raising an exception if an <a> tag has no strong parent. Using a proper function, you could make the test more efficient.

    0 讨论(0)
提交回复
热议问题