How to check if a string contain only UTF-8 characters

喜夏-厌秋 提交于 2020-08-07 08:15:08

问题


So far I am doing something like this:

def is_utf8(s):
    try:
        x=bytes(s,'utf-8').decode('utf-8', 'strict')
        print(x)
        return 1
    except:
        return 0

the only problem is that I don't want it to print anything, I want to delete the print(x) and when I do that, the function stops functioning correctly. For example if I do : print(is_utf8("H�tst")) while the print is in the function it returns 0 otherwise it prints 1. Am i approaching the problem in a wrong way


回答1:


You could use the chardet module to detect an unknown encoding. For example if a is a byte array then you could determine the encoding like this:

import chardet

b = chardet.detect(a)
print(b["encoding"])


来源:https://stackoverflow.com/questions/49479913/how-to-check-if-a-string-contain-only-utf-8-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!