Check printable for Unicode

那年仲夏 提交于 2020-06-17 03:09:59

问题


I know that to check whether a string is printable, we can do something like:

def isprintable(s,codec='utf8'):
    try: 
        s.codec(codec)
    except UnicodeDecodeError: 
        return False
    else: 
        return True

But is there a way to do it with Unicode, not a string? Btw, I'm working with tweets, and I convert the tweets to Unicode as follows

text=unicode(status.text)

回答1:


You are looking for a test for a range of codepoints, so you need a regular expression:

import re
# match characters from ¿ to the end of the JSON-encodable range
exclude = re.compile(ur'[\u00bf-\uffff]')

def isprintable(s):
    return not bool(exclude.search(s))

This will return False for any unicode text that has codepoints past \u00BE ("¾").

>>> isprintable(u'Hello World!')
True
>>> isprintable(u'Jeg \u00f8ve mit Norsk.')
False



回答2:


I'm not sure a solution using codepoints is robust in the face of Unicode standard changes or different encodings. A more abstract solution:

import unicodedata

if unicodedata.category(char) == 'Cc':
        raise UnhandledKeypressError('unprintable char')

In other words, a string is printable if all its chars (unicode objects) do not have property category having value 'control.'

For comparison, Qt's QChar.isPrint() :

Returns true if the character is a printable character; otherwise returns false. This is any character not of category Cc or Cn. Note that this gives no indication of whether the character is available in a particular font.



来源:https://stackoverflow.com/questions/14383937/check-printable-for-unicode

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!