I\'m working with HTML elements that have child tags, which I want to \"ignore\" or remove, so that the text is still there. Just now, if I try to .string
any e
for child in soup.find(id='main'):
if isinstance(child, bs4.Tag):
print child.text
And, you'll get:
This is a paragraph.
This is a paragraph with a tag.
This is another paragraph.
Use the .strings iterable instead. Use ''.join()
to pull in all strings and join them together:
print ''.join(main.strings)
Iterating over .strings
yields each and every contained string, directly or in child tags.
Demo:
>>> print ''.join(main.strings)
This is a paragraph.
This is a paragraph with a tag.
This is another paragraph.