sqlite remove non utf-8 characters

对着背影说爱祢 提交于 2019-12-31 02:13:06

问题


I have an sqlite db that has some crazy ascii characters in it and I would like to remove them, but I have no idea how to go about doing it. I googled some stuff and found some people saying to use REGEXP with mysql, but that threw an error saying REGEXP wasn't recognized.

Here is the error I get:

sqlalchemy.exc.OperationalError: (OperationalError) Could not decode to UTF-8 column 'table_name' with text ...

Thanks for the help


回答1:


Well, if you really want to shoehorn a rich unicode string into a plain ascii string (and don't mind some goofs), you could use this:

import unicodedata as ud
def shoehorn_unicode_into_ascii(s):
    # This removes accents, but also other things, like ß‘’“”
    return ud.normalize('NFKD', s).encode('ascii','ignore')

For a more complete solution (with somewhat fewer goofs, but requiring a third-party module unidecode), see this answer.

Really, though, the best solution is to work with unicode data throughout your code as much as possible, and drop to an encoding only when necessary.




回答2:


django.utils.encoding has a greate set of robust unicode encoding and decoding functions.



来源:https://stackoverflow.com/questions/3586903/sqlite-remove-non-utf-8-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!