发表新帖

发表新帖

Filter out HTML tags and resolve entities in python

前端未结

关注

 8  1819

暗喜 2020-12-03 00:11

Because regular expressions scare me, I\'m trying to find a way to remove all HTML tags and resolve HTML entities from a string in Python.

8条回答

萌比男神i (楼主)

2020-12-03 00:32

Regular expressions are not scary, but writing your own regexes to strip HTML is a sure path to madness (and it won't work, either). Follow the path of wisdom, and use one of the many good HTML-parsing libraries.

Lucas' example is also broken because "sub" is not a method of a Python string. You'd have to "import re", then call re.sub(pattern, repl, string). But that's neither here nor there, as the correct answer to your question does not involve writing any regexes.

0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...

热议问题