I have a html text like this:
<xml ... >
and I want to convert it to something readable:
Official documentation for HTMLParser
: Python 2.7
>>> import HTMLParser
>>> pars = HTMLParser.HTMLParser()
>>> pars.unescape('© €')
u'\xa9 \u20ac'
>>> print _
© €
Official documentation for HTMLParser
: Python 3
>>> from html.parser import HTMLParser
>>> pars = HTMLParser()
>>> pars.unescape('© €')
© €
There is a function here that does it, as linked from the post Fred pointed out. Copied here to make things easier.
Credit to Fred Larson for linking to the other question on SO. Credit to dF for posting the link.
Modern Python 3 approach:
>>> import html
>>> html.unescape('© €')
© €
https://docs.python.org/3/library/html.html