Exclude unwanted tag on Beautifulsoup Python

后端 未结 2 1275
萌比男神i
萌比男神i 2020-12-01 14:52

  I Like
   to punch 
   your face
 

How to print \"I Like your face\" instea

2条回答
  •  一个人的身影
    2020-12-01 15:45

    You can use extract() to remove unwanted tag before you get text.

    But it keeps all '\n' and spaces so you will need some work to remove them.

    data = '''
      I Like
       to punch 
       your face
     '''
    
    from bs4 import BeautifulSoup as BS
    
    soup = BS(data, 'html.parser')
    
    external_span = soup.find('span')
    
    print("1 HTML:", external_span)
    print("1 TEXT:", external_span.text.strip())
    
    unwanted = external_span.find('span')
    unwanted.extract()
    
    print("2 HTML:", external_span)
    print("2 TEXT:", external_span.text.strip())
    

    Result

    1 HTML: 
      I Like
       to punch 
       your face
     
    1 TEXT: I Like
       to punch 
       your face
    2 HTML: 
      I Like
    
       your face
     
    2 TEXT: I Like
    
       your face
    

    You can skip every Tag object inside external span and keep only NavigableString objects (it is plain text in HTML).

    data = '''
      I Like
       to punch 
       your face
     '''
    
    from bs4 import BeautifulSoup as BS
    import bs4
    
    soup = BS(data, 'html.parser')
    
    external_span = soup.find('span')
    
    text = []
    for x in external_span:
        if isinstance(x, bs4.element.NavigableString):
            text.append(x.strip())
    print(" ".join(text))
    

    Result

    I Like your face
    

提交回复
热议问题