Decode string with hex characters in python 2

问题

I have a hex string and i want to convert it utf8 to insert mysql. (my database is utf8)

hex_string = 'kitap ara\xfet\xfdrmas\xfd'
...
result = 'kitap araştırması'

How can I do that?

回答1:

Assuming Python 2.6,

>>> print('kitap ara\xfet\xfdrmas\xfd'.decode('iso-8859-9'))
kitap araştırması
>>> 'kitap ara\xfet\xfdrmas\xfd'.decode('iso-8859-9').encode('utf-8')
'kitap ara\xc5\x9ft\xc4\xb1rmas\xc4\xb1'

回答2:

Try(Python 3.x):

import codecs
codecs.decode("707974686f6e2d666f72756d2e696f", "hex").decode('utf-8')

From here.

回答3:

Try

hex_string.decode("cp1254").encode("utf-8")

(cp1254 or iso-8859-9 are the Turkish codepages, the former being the usual name on Windows platforms, but in Python, both work equally well)

回答4:

First you need to decode it from the encoded bytes you have. That appears to be ISO-8859-9 (latin-5), or, if you are using Windows, probably code page 1254, which is based on latin-5.

>>> 'kitap ara\xfet\xfdrmas\xfd'.decode('cp1254')
u'kitap ara\u015ft\u0131rmas\u0131' # u'kitap araştırması'

If you are using Windows, then depending on where you are getting those bytes, it might be more appropriate to decode them as mbcs, which translates to ‘whichever code page the local system is using’. If the string is just sitting in a .py file, you would be better off just writing u'kitap araştırması' in the source and setting a -*- coding declaration to direct Python to decode it. See PEP 263.

As to how to encode unicode strings to UTF-8 for the database, well, if you want to you can do it manually:

>>> u'kitap ara\u015ft\u0131rmas\u0131'.encode('utf-8')
'kitap ara\xc5\x9ft\xc4\xb1rmas\xc4\xb1'

but a good data access layer is likely to do that automatically for you, if you've got the COLLATION of the tables the data is going into right.

回答5:

String literals explains how to use UTF8 strings in Python source.

来源：https://stackoverflow.com/questions/3045876/decode-string-with-hex-characters-in-python-2

标签

python

utf-8

hex

python-2.x