Detect chinese character using perl?

后端 未结 1 656
挽巷
挽巷 2020-12-11 10:59

Is there any way to detect Chinese characters using Perl? And is there any way on how to split Chinese characters with symbol dot \'.\' perfectly?

相关标签:
1条回答
  • 2020-12-11 11:27

    Depends on your particular notion of what is a Chinese character. Perhaps you're looking for /\p{Script=Hani}/, but if we want to cast our net wide, the following regex pattern will match stuff that occurs in Chinese writing. Restrict if necessary.

    use 5.014;
    /
        (?: \p{Block=CJK_Compatibility}
        |   \p{Block=CJK_Compatibility_Forms}
        |   \p{Block=CJK_Compatibility_Ideographs}
        |   \p{Block=CJK_Compatibility_Ideographs_Supplement}
        |   \p{Block=CJK_Radicals_Supplement}
        |   \p{Block=CJK_Strokes}
        |   \p{Block=CJK_Symbols_And_Punctuation}
        |   \p{Block=CJK_Unified_Ideographs}
        |   \p{Block=CJK_Unified_Ideographs_Extension_A}
        |   \p{Block=CJK_Unified_Ideographs_Extension_B}
        |   \p{Block=CJK_Unified_Ideographs_Extension_C}
        )
    /x;
    

    Yes, . matches one character. The empty pattern for split DWYM:

    use utf8;
    split //, '冰淇淋'
    # returns ('冰', '淇', '淋')
    
    0 讨论(0)
提交回复
热议问题