Replace html entities with the corresponding utf-8 characters in Python 2.6

后端 未结 3 789
长发绾君心
长发绾君心 2020-12-15 23:04

I have a html text like this:

<xml ... >

and I want to convert it to something readable:




        
相关标签:
3条回答
  • 2020-12-15 23:31

    Python 2.7

    Official documentation for HTMLParser: Python 2.7

    >>> import HTMLParser
    >>> pars = HTMLParser.HTMLParser()
    >>> pars.unescape('&copy; &euro;')
    u'\xa9 \u20ac'
    >>> print _
    © €
    

    Python 3

    Official documentation for HTMLParser: Python 3

    >>> from html.parser import HTMLParser
    >>> pars = HTMLParser()
    >>> pars.unescape('&copy; &euro;')
    © €
    
    0 讨论(0)
  • 2020-12-15 23:35

    There is a function here that does it, as linked from the post Fred pointed out. Copied here to make things easier.

    Credit to Fred Larson for linking to the other question on SO. Credit to dF for posting the link.

    0 讨论(0)
  • 2020-12-15 23:42

    Modern Python 3 approach:

    >>> import html
    >>> html.unescape('&copy; &euro;')
    © €
    

    https://docs.python.org/3/library/html.html

    0 讨论(0)
提交回复
热议问题