How to find out Chinese or Japanese Character in a String in Python?

前端 未结 4 1093
北海茫月
北海茫月 2021-01-31 05:16

Such as:

str = \'sdf344asfasf天地方益3権sdfsdf\'

Add () to Chinese and Japanese Characters:

strAfterConvert = \'sdfasf         


        
4条回答
  •  甜味超标
    2021-01-31 05:47

    You can do the edit using the regex package, which supports checking the Unicode "Script" property of each character and is a drop-in replacement for the re package:

    import regex as re
    
    pattern = re.compile(r'([\p{IsHan}\p{IsBopo}\p{IsHira}\p{IsKatakana}]+)', re.UNICODE)
    
    input = u'sdf344asfasf天地方益3権sdfsdf'
    output = pattern.sub(r'(\1)', input)
    print output  # Prints: sdf344asfasf(天地方益)3(権)sdfsdf
    

    You should adjust the \p{Is...} sequences with the character scripts/blocks that you consider to be "Chinese or Japanese".

提交回复
热议问题