SQLite, python, unicode, and non-utf data

后端 未结 5 640
死守一世寂寞
死守一世寂寞 2020-12-02 05:37

I started by trying to store strings in sqlite using python, and got the message:

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unles

5条回答
  •  栀梦
    栀梦 (楼主)
    2020-12-02 06:16

    UTF-8 is the default encoding of SQLite databases. This shows up in situations like "SELECT CAST(x'52C3B373' AS TEXT);". However, the SQLite C library doesn't actually check whether a string inserted into a DB is valid UTF-8.

    If you insert a Python unicode object (or str object in 3.x), the Python sqlite3 library will automatically convert it to UTF-8. But if you insert a str object, it will just assume the string is UTF-8, because Python 2.x "str" doesn't know its encoding. This is one reason to prefer Unicode strings.

    However, it doesn't help you if your data is broken to begin with.

    To fix your data, do

    db.create_function('FIXENCODING', 1, lambda s: str(s).decode('latin-1'))
    db.execute("UPDATE TheTable SET TextColumn=FIXENCODING(CAST(TextColumn AS BLOB))")
    

    for every text column in your database.

提交回复
热议问题