I am working o scrapy, I scraped some sites and stored the items from the scraped page in to json files, but some of them are containing the following format.
You have byte strings containing unicode escapes. You can convert them to unicode with the unicode_escape
codec:
>>> print "H\u00eatres et \u00e9tang".decode("unicode_escape")
Hêtres et étang
And you can encode it back to byte strings:
>>> s = "H\u00eatres et \u00e9tang".decode("unicode_escape")
>>> s.encode("latin1")
'H\xeatres et \xe9tang'
You can filter and decode the non-unicode strings like:
for s in l:
if not isinstance(s, unicode):
print s.decode('unicode_escape')