Find all Chinese text in a string using Python and Regex

前端 未结 2 614
醉酒成梦
醉酒成梦 2020-11-27 13:31

I needed to strip the Chinese out of a bunch of strings today and was looking for a simple Python regex. Any suggestions?

2条回答
  •  悲&欢浪女
    2020-11-27 13:52

    Python 2:

    #!/usr/bin/env python
    # -*- encoding: utf8 -*-
    
    
    import re
    
    sample = u'I am from 美国。We should be friends. 朋友。'
    for n in re.findall(ur'[\u4e00-\u9fff]+',sample):
        print n
    

    Python 3:

    sample = 'I am from 美国。We should be friends. 朋友。'
    for n in re.findall(r'[\u4e00-\u9fff]+', sample):
        print(n)
    

    Output:

    美国
    朋友
    

    About Unicode code blocks:

    The 4E00—9FFF range covers CJK Unified Ideographs (CJK=Chinese, Japanese and Korean). There are a number of lower ranges that relate, to some degree, to CJK:

    31C0—31EF CJK Strokes
    31F0—31FF Katakana Phonetic Extensions
    3200—32FF Enclosed CJK Letters and Months
    3300—33FF CJK Compatibility
    3400—4DBF CJK Unified Ideographs Extension A
    4DC0—4DFF Yijing Hexagram Symbols
    4E00—9FFF CJK Unified Ideographs 
    

提交回复
热议问题