Character encoding in python to replace 'u2019' with '

跟風遠走 提交于 2019-12-18 09:39:40

问题


I have tried numerous ways to encode this to the end result "BACK RUSHIN'" with the most important character being the right apostrophe '.

I would like a way of getting to this end result using some of the built in functions Python has where there is no discrimination between a normal string and a unicode string.

This was the code I was using to retrieve the string: str(unicode(etree.tostring(root.xpath('path')[0],method='text', encoding='utf-8'),errors='ignore')).strip()

With the result being: 'BACK RUSHIN' the thing being the apostrophe ' is missing.

Another way was: root.xpath('path/text()')

And that result was: u'BACK RUSHIN\u2019' in python.

Lastly if I try: u'BACK RUSHIN\u2019'.encode('ascii', 'replace')

The result is: 'BACK RUSHIN?'

Please no replace functions, I would like to make use of pythons codec libraries. Also no printing the string because it is being held in a variable.

Thanks


回答1:


>>> import unidecode
>>> unidecode.unidecode(u'BACK RUSHIN\u2019')
"BACK RUSHIN'"

unidecode



来源:https://stackoverflow.com/questions/25924860/character-encoding-in-python-to-replace-u2019-with

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!