How to store arabic text in mysql database using python?

家住魔仙堡 提交于 2019-12-06 09:58:29

To clarify a few things, because it will help you along in the future as well.

txt = u'Arabic (\u0627\u0644\u0637\u064a\u0631\u0627\u0646)'

This is not an Arabic string. This is a unicode object, with unicode codepoints. If you were to simply print it, and if your terminal supports Arabic you would get output like this:

>>> txt = u'Arabic (\u0627\u0644\u0637\u064a\u0631\u0627\u0646)'
>>> print(txt)
Arabic (الطيران)

Now, to get the same output like Arabic (الطيران) in your database, you need to encode the string.

Encoding is taking these code points; and converting them to bytes so that computers know what to do with them.

So the most common encoding is utf-8, because it supports all the characters of English, plus a lot of other languages (including Arabic). There are others too, for example, windows-1256 also supports Arabic. There are some that don't have references for those numbers (called code points), and when you try to encode, you'll get an error like this:

>>> print(txt.encode('latin-1'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 8-14: ordinal not in range(256)

What that is telling you is that some number in the unicode object does not exist in the table latin-1, so the program doesn't know how to convert it to bytes.

Computers store bytes. So when storing or transmitting information you need to always encode/decode it correctly.

This encode/decode step is sometimes called the unicode sandwich - everything outside is bytes, everything inside is unicode.


With that out of the way, you need to encode the data correctly before you send it to your database; to do that, encode it:

q = u"""
    INSERT INTO
       tab1(id, username, text, created_at)
    VALUES (%s, %s, %s, %s)"""

conn = MySQLdb.connect(host="localhost",
                       user='root',
                       password='',
                       db='',
                       charset='utf8',
                       init_command='SET NAMES UTF8')
cur = conn.cursor()
cur.execute(q, (id.encode('utf-8'),
                user_name.encode('utf-8'),
                text.encode('utf-8'), date))

To confirm that it is being inserted correctly, make sure you are using mysql from a terminal or application that supports Arabic; otherwise - even if its inserted correctly, when it is displayed by your program - you will see garbage characters.

Tim Biegeleisen

Just execute SET names utf8 before executing your INSERT:

cur.execute("set names utf8;")

cur.execute("INSERT INTO tab1(id, username, text, created_at) VALUES (%s, %s, %s, %s)", (smart_str(id), smart_str(user_name), smart_str(text), date))

Your question is very similar to this SO post, which you should read.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!