Exclude unwanted tag on Beautifulsoup Python

后端 未结 2 1270
萌比男神i
萌比男神i 2020-12-01 14:52

  I Like
   to punch 
   your face
 

How to print \"I Like your face\" instea

相关标签:
2条回答
  • 2020-12-01 15:41

    You can easily find the (un)desired text like this:

    from bs4 import BeautifulSoup
    
    text = """<span>
      I Like
      <span class='unwanted'> to punch </span>
       your face
     <span>"""
    soup = BeautifulSoup(text, "lxml")
    for i in soup.find_all("span"):
        if 'class' in i.attrs:
            if "unwanted" in i.attrs['class']:
                print(i.text)
    

    From here outputting everything else can be easily done

    0 讨论(0)
  • 2020-12-01 15:45

    You can use extract() to remove unwanted tag before you get text.

    But it keeps all '\n' and spaces so you will need some work to remove them.

    data = '''<span>
      I Like
      <span class='unwanted'> to punch </span>
       your face
     <span>'''
    
    from bs4 import BeautifulSoup as BS
    
    soup = BS(data, 'html.parser')
    
    external_span = soup.find('span')
    
    print("1 HTML:", external_span)
    print("1 TEXT:", external_span.text.strip())
    
    unwanted = external_span.find('span')
    unwanted.extract()
    
    print("2 HTML:", external_span)
    print("2 TEXT:", external_span.text.strip())
    

    Result

    1 HTML: <span>
      I Like
      <span class="unwanted"> to punch </span>
       your face
     <span></span></span>
    1 TEXT: I Like
       to punch 
       your face
    2 HTML: <span>
      I Like
    
       your face
     <span></span></span>
    2 TEXT: I Like
    
       your face
    

    You can skip every Tag object inside external span and keep only NavigableString objects (it is plain text in HTML).

    data = '''<span>
      I Like
      <span class='unwanted'> to punch </span>
       your face
     <span>'''
    
    from bs4 import BeautifulSoup as BS
    import bs4
    
    soup = BS(data, 'html.parser')
    
    external_span = soup.find('span')
    
    text = []
    for x in external_span:
        if isinstance(x, bs4.element.NavigableString):
            text.append(x.strip())
    print(" ".join(text))
    

    Result

    I Like your face
    
    0 讨论(0)
提交回复
热议问题