sqlite3.OperationalError: Could not decode to UTF-8 column

天涯浪子 提交于 2020-05-11 06:22:41

问题


I have a sqlite database with this row of information, the ù should really be a '-'

sqlite> select * from t_question where rowid=193;
193|SAT1000|having a pointed, sharp qualityùoften used to describe smells|pungent|lethargic|enigmatic|resolute|grievous

When I read that row from python I get this error, what am I doing wrong?

Traceback (most recent call last):
  File "foo_error.py", line 8, in <module>
    cur.execute(sql_string)
  sqlite3.OperationalError: Could not decode to UTF-8 column 'posit' with text 'having a pointed, sharp qualityùoften used to describe smells'

Python File:

import sqlite3
conn = sqlite3.connect('sat1000.db')
cur = conn.cursor()
sql_string = 'SELECT * FROM t_question WHERE rowid=193'
cur.execute(sql_string)
conn.close()

回答1:


Set text_factory to str:

conn = sqlite3.connect('sat1000.db')
conn.text_factory = str

This will cause cur to return strs instead of automatically trying to decode the str with the UTF-8 codec.

I wasn't able to find any chain of decodings and encodings that would transform 'ù' to a hyphen, but there are many possible unicode hyphens such as u'-', u'\xad', u'\u2010', u'\u2011', u'\u2043', u'\ufe63' and u'\uff0d', and I haven't ruled out the possibility that such a chain of decoding/encodings might exist. However, unless you can find the right transformation, it might be easiest to simply use str.replace to fix the string.

Correction:

In [43]: print('ù'.decode('utf-8').encode('cp437').decode('cp1252'))
—    # EM DASH u'\u2014'

So there are chains of decoding/encodings which can transform 'ù' into some form of hyphen.




回答2:


conn.text_factory = str doesn't work for me.

I use conn.text_factory = bytes. reference here: https://stackoverflow.com/a/23509002/6452438




回答3:


The answer by unutbu won't work in current versions of Python 3. Setting conn.text_factory = str won't do anything, since the default value of text_factory is already str.

Probably the problem is that you have text in a database column that is not valid UTF-8. By default, Python's decode() function throws an exception when it sees text like that. But you can set a text_factory that tells decode() to ignore such errors, like this:

conn = sqlite3.connect('my-database.db')
conn.text_factory = lambda b: b.decode(errors = 'ignore')

Then the query should run without an error.



来源:https://stackoverflow.com/questions/22751363/sqlite3-operationalerror-could-not-decode-to-utf-8-column

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!