Remove lines getting empty after BeautifulSoup decompose

允我心安 提交于 2021-01-28 00:31:27

问题


I am trying to strip certain HTML tags and their content from a file with BeautifulSoup. How can I remove lines that get empty after applying decompose()? In this example, I want the line between a and 3 to be gone, as this is where the <span>...</span> block was, but not the line in the end.

from bs4 import BeautifulSoup     

Rmd_data = 'a\n<span class="answer">\n2\n</span>\n3\n'
print(Rmd_data)

#OUTPUT
# a
# <span class="answer">
# 2
# </span>
# 3
# 
# END OUTPUT

soup = BeautifulSoup(Rmd_data, "html.parser")
answers = soup.find_all("span", "answer")
for a in answers:
    a.decompose()

Rmd_data = str(soup)
print(Rmd_data)

# OUTPUT
# a
#
# 3
# 
# END OUTPUT

回答1:


For removing empty lines most easy will be via re

import re
re.sub(r'[\n\s]+', r'\n', text, re.MULTLINE)



回答2:


I'm surprised that BeatifulSoup does not offer a prettify() option. Instead of manipulating the html manually you could re-parse your html:

str(BeautifulSoup(str(soup), 'html.parser'))

As always, enjoy.



来源:https://stackoverflow.com/questions/42286777/remove-lines-getting-empty-after-beautifulsoup-decompose

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!