Python removing extra special unicode characters
I'm working with some text in python, it's already in unicode format internally but I would like to get rid of some special characters and replace them with more standard versions. I currently have a line that looks like this, but it's getting ever more complex and I see it will eventually bring more trouble. tmp = infile.lower().replace(u"\u2018", "'").replace(u"\u2019", "'").replace(u"\u2013", "").replace(u"\u2026", "") for example the u\2018 and \u2019 are left and right single quotes. Those are somewhat acceptable but for this type of text processing I don't think they are needed. Things