Filter out HTML tags and resolve entities in python

前端 未结 8 1799
暗喜
暗喜 2020-12-03 00:11

Because regular expressions scare me, I\'m trying to find a way to remove all HTML tags and resolve HTML entities from a string in Python.

8条回答
  •  萌比男神i
    2020-12-03 00:32

    Regular expressions are not scary, but writing your own regexes to strip HTML is a sure path to madness (and it won't work, either). Follow the path of wisdom, and use one of the many good HTML-parsing libraries.

    Lucas' example is also broken because "sub" is not a method of a Python string. You'd have to "import re", then call re.sub(pattern, repl, string). But that's neither here nor there, as the correct answer to your question does not involve writing any regexes.

提交回复
热议问题