Remove non-ASCII characters from a string using python / django

前端 未结 6 475
情歌与酒
情歌与酒 2020-12-05 19:11

I have a string of HTML stored in a database. Unfortunately it contains characters such as ® I want to replace these characters by their HTML equivalent, either in the DB it

6条回答
  •  庸人自扰
    2020-12-05 20:10

    To get rid of the special xml, html characters '<', '>', '&' you can use cgi.escape:

    import cgi
    test = "1 < 4 & 4 > 1"
    cgi.escape(test)
    

    Will return:

    '1 < 4 & 4 > 1'
    

    This is probably the bare minimum you need to avoid problem. For more you have to know the encoding of your string. If it fit the encoding of your html document you don't have to do something more. If not you have to convert to the correct encoding.

    test = test.decode("cp1252").encode("utf8")
    

    Supposing that your string was cp1252 and that your html document is utf8

提交回复
热议问题