Converting a latin string to unicode in python

后端未结

关注

 2  1084

清歌不尽 2021-01-03 08:17

I am working o scrapy, I scraped some sites and stored the items from the scraped page in to json files, but some of them are containing the following format.

2条回答

既然无缘 (楼主)

2021-01-03 08:43
You have byte strings containing unicode escapes. You can convert them to unicode with the unicode_escape codec:
```
>>> print "H\u00eatres et \u00e9tang".decode("unicode_escape")
Hêtres et étang
```
And you can encode it back to byte strings:
```
>>> s = "H\u00eatres et \u00e9tang".decode("unicode_escape")
>>> s.encode("latin1")
'H\xeatres et \xe9tang'
```
You can filter and decode the non-unicode strings like:
```
for s in l: 
    if not isinstance(s, unicode): 
        print s.decode('unicode_escape')
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...