I have a string of HTML stored in a database. Unfortunately it contains characters such as ® I want to replace these characters by their HTML equivalent, either in the DB it
To get rid of the special xml, html characters '<', '>', '&' you can use cgi.escape:
import cgi
test = "1 < 4 & 4 > 1"
cgi.escape(test)
Will return:
'1 < 4 & 4 > 1'
This is probably the bare minimum you need to avoid problem. For more you have to know the encoding of your string. If it fit the encoding of your html document you don't have to do something more. If not you have to convert to the correct encoding.
test = test.decode("cp1252").encode("utf8")
Supposing that your string was cp1252 and that your html document is utf8