How to check if string is 100% ascii in python 3

牧云@^-^@ 提交于 2021-02-10 04:31:51

问题


i have two strings

eng = "Clash of Clans – Android Apps on Google Play"
rus = "Castle Clash: Новая Эра - Android Apps on Google Play"

and now i want to check whether string is in English or not by using Python 3.

I have read this Stackoverflow answer here and it does not help me as its for Python 2.x solution but in comments some one mention that use

string.encode('ascii')

to make it work in Python 3.x but my problem is, in both cases it raises same UnicodeEncodeError exception!

Screenshot:

so now i am stuck here and cant figure out how to make it work! kindly guide me or i have to use another method to determine if String is in English or not! Thanks


回答1:


As with Salvador Dali's answer you linked to, you must use a try-catch block to check for an error in encoding.

# -*- coding: utf-8 -*-
def isEnglish(s):
    try:
        s.encode('ascii')
    except UnicodeEncodeError:
        return False
    else:
        return True

Just to note though, when I copy and pasted your eng and rus strings to try them, they both came up as False. Retyping the English one returned True, so I'm not sure what's up with that.




回答2:


Your English string really isn't true ASCII, it contains the character U+2013 - EN DASH. This looks very similar to the ASCII dash U+002d but it is different.

If this is the only character you need to worry about, you can do a simple replacement to make it work:

>>> eng.replace('\u2013', '-').encode('ascii')
b'Clash of Clans - Android Apps on Google Play'



回答3:


You can use the isascii() method:

>>> rus.isascii()
False


来源:https://stackoverflow.com/questions/33004065/how-to-check-if-string-is-100-ascii-in-python-3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!