How do I ignore tags while getting the .string of a Beautiful Soup element?

后端未结

关注

 2  1702

I\'m working with HTML elements that have child tags, which I want to \"ignore\" or remove, so that the text is still there. Just now, if I try to .string any e

相关标签:

2条回答

无人共我

2020-12-20 01:33

for child in soup.find(id='main'):
    if isinstance(child, bs4.Tag):
        print child.text

And, you'll get:

This is a paragraph.
This is a paragraph with a tag.
This is another paragraph.

0 讨论(0)

小鲜肉

2020-12-20 01:37
Use the .strings iterable instead. Use ''.join() to pull in all strings and join them together:
```
print ''.join(main.strings)
```
Iterating over .strings yields each and every contained string, directly or in child tags.

Demo:
```
>>> print ''.join(main.strings)

This is a paragraph. 
This is a paragraph with a tag. 
This is another paragraph. 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...