Remove non-ASCII characters from a string using python / django

前端未结

关注

 6  483

情歌与酒 2020-12-05 19:11

I have a string of HTML stored in a database. Unfortunately it contains characters such as ® I want to replace these characters by their HTML equivalent, either in the DB it

6条回答

庸人自扰 (楼主)

2020-12-05 20:10
To get rid of the special xml, html characters '<', '>', '&' you can use cgi.escape:
```
import cgi
test = "1 < 4 & 4 > 1"
cgi.escape(test)
```
Will return:
```
'1 < 4 & 4 > 1'
```
This is probably the bare minimum you need to avoid problem. For more you have to know the encoding of your string. If it fit the encoding of your html document you don't have to do something more. If not you have to convert to the correct encoding.
```
test = test.decode("cp1252").encode("utf8")
```
Supposing that your string was cp1252 and that your html document is utf8
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...