问题
I know that to check whether a string is printable, we can do something like:
def isprintable(s,codec='utf8'):
try:
s.codec(codec)
except UnicodeDecodeError:
return False
else:
return True
But is there a way to do it with Unicode, not a string? Btw, I'm working with tweets, and I convert the tweets to Unicode as follows
text=unicode(status.text)
回答1:
You are looking for a test for a range of codepoints, so you need a regular expression:
import re
# match characters from ¿ to the end of the JSON-encodable range
exclude = re.compile(ur'[\u00bf-\uffff]')
def isprintable(s):
return not bool(exclude.search(s))
This will return False
for any unicode text that has codepoints past \u00BE
("¾").
>>> isprintable(u'Hello World!')
True
>>> isprintable(u'Jeg \u00f8ve mit Norsk.')
False
回答2:
I'm not sure a solution using codepoints is robust in the face of Unicode standard changes or different encodings. A more abstract solution:
import unicodedata
if unicodedata.category(char) == 'Cc':
raise UnhandledKeypressError('unprintable char')
In other words, a string is printable if all its chars (unicode objects) do not have property category having value 'control.'
For comparison, Qt's QChar.isPrint() :
Returns true if the character is a printable character; otherwise returns false. This is any character not of category Cc or Cn. Note that this gives no indication of whether the character is available in a particular font.
来源:https://stackoverflow.com/questions/14383937/check-printable-for-unicode