BeautifulSoup - combine consecutive tags

前端 未结 2 1603
孤独总比滥情好
孤独总比滥情好 2021-01-19 14:11

I have to work with the messiest HTML where individual words are split into separate tags, like in the following example:



        
2条回答
  •  我在风中等你
    2021-01-19 14:19

    The solution below combines text from all the selected tags into one of your choice and decomposes the others.

    If you only want to merge the text from consecutive tags follow Danny's approach.

    Code:

    from bs4 import BeautifulSoup
    
    html = '''
    
    I NTRODUCTION
    ''' soup = BeautifulSoup(html, 'lxml') container = soup.select_one('#wrapper') # it contains b tags to combine b_tags = container.find_all('b') # combine all the text from b tags text = ''.join(b.get_text(strip=True) for b in b_tags) # here you choose a tag you want to preserve and update its text b_main = b_tags[0] # you can target it however you want, I just take the first one from the list b_main.span.string = text # replace the text for tag in b_tags: if tag is not b_main: tag.decompose() print(soup)

    Any comments appreciated.

提交回复
热议问题