How can I strip comment tags from HTML using BeautifulSoup?

后端未结

关注

 4  792

暖寄归人 2020-11-28 13:41

I have been playing with BeautifulSoup, which is great. My end goal is to try and just get the text from a page. I am just trying to get the text from the body, with a speci

4条回答

抹茶落季 (楼主)

2020-11-28 14:10
I am still trying to figure out why it doesn't find and strip tags like this: . Those backslashes cause certain tags to be overlooked.

This may be a problem with the underlying SGML parser: see http://www.crummy.com/software/BeautifulSoup/documentation.html#Sanitizing%20Bad%20Data%20with%20Regexps. You can override it by using a markupMassage regex -- straight from the docs:
```
import re, copy

myMassage = [(re.compile('Bar
Baz
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...