I Like
to punch
your face
How to print \"I Like your face\" instea
You can easily find the (un)desired text like this:
from bs4 import BeautifulSoup
text = """<span>
I Like
<span class='unwanted'> to punch </span>
your face
<span>"""
soup = BeautifulSoup(text, "lxml")
for i in soup.find_all("span"):
if 'class' in i.attrs:
if "unwanted" in i.attrs['class']:
print(i.text)
From here outputting everything else can be easily done
You can use extract()
to remove unwanted tag before you get text.
But it keeps all '\n'
and spaces
so you will need some work to remove them.
data = '''<span>
I Like
<span class='unwanted'> to punch </span>
your face
<span>'''
from bs4 import BeautifulSoup as BS
soup = BS(data, 'html.parser')
external_span = soup.find('span')
print("1 HTML:", external_span)
print("1 TEXT:", external_span.text.strip())
unwanted = external_span.find('span')
unwanted.extract()
print("2 HTML:", external_span)
print("2 TEXT:", external_span.text.strip())
Result
1 HTML: <span>
I Like
<span class="unwanted"> to punch </span>
your face
<span></span></span>
1 TEXT: I Like
to punch
your face
2 HTML: <span>
I Like
your face
<span></span></span>
2 TEXT: I Like
your face
You can skip every Tag
object inside external span and keep only NavigableString
objects (it is plain text in HTML).
data = '''<span>
I Like
<span class='unwanted'> to punch </span>
your face
<span>'''
from bs4 import BeautifulSoup as BS
import bs4
soup = BS(data, 'html.parser')
external_span = soup.find('span')
text = []
for x in external_span:
if isinstance(x, bs4.element.NavigableString):
text.append(x.strip())
print(" ".join(text))
Result
I Like your face