How do I get the “visible” length of a combining Unicode string in Python?

后端 未结 3 1817
被撕碎了的回忆
被撕碎了的回忆 2021-01-04 10:47

If I have a Python Unicode string that contains combining characters, len reports a value that does not correspond to the number of characters \"seen\".

<
3条回答
  •  天涯浪人
    2021-01-04 11:08

    If you have a regex flavor that supports matching grapheme, you can use \X

    Demo

    While the default Python re module does not support \X, Matthew Barnett's regex module does:

    >>> len(regex.findall(r'\X', u'A\u0332\u0305BC'))
    3
    

    On Python 2, you need to use u in the pattern:

    >>> regex.findall(u'\\X', u'A\u0332\u0305BC')
    [u'A\u0332\u0305', u'B', u'C']
    >>> len(regex.findall(u'\\X', u'A\u0332\u0305BC'))
    3
    

提交回复
热议问题