Approximately converting unicode string to ascii string in python

后端未结

关注

 5  1498

don\'t know wether this is trivial or not, but I\'d need to convert an unicode string to ascii string, and I wouldn\'t like to have all those escape chars around. I mean, is

相关标签:

5条回答

一个人的身影

2020-12-24 14:41

There is a technique to strip accents from characters, but other characters need to be directly replaced. Check this article: http://effbot.org/zone/unicode-convert.htm

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤城傲影

2020-12-24 14:43
Try simple character replacement
```
str1 = "“I am the greatest”, said Gavin O’Connor"
print(str1)
print(str1.replace("’", "'").replace("“","\"").replace("”","\""))
```
PS: add # -*- coding: utf-8 -*- to the top of your .py file if you get error
0 讨论(0)
发布评论:

提交评论
- 加载中...
攒了一身酷

2020-12-24 14:52
```
b = str(a.encode('utf-8').decode('ascii', 'ignore'))
```
should work fine.
0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2020-12-24 14:55
Use the Unidecode package to transliterate the string.
```
>>> import unidecode
>>> unidecode.unidecode(u'Gavin O’Connor')
"Gavin O'Connor"
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-12-24 14:55
```
import unicodedata

unicode_string = u"Gavin O’Connor"
print unicodedata.normalize('NFKD', unicode_string).encode('ascii','ignore')
```
Output:
```
Gavin O'Connor
```
Here's the document that describes the normalization forms: http://unicode.org/reports/tr15/
0 讨论(0)
发布评论:

提交评论
- 加载中...