Detect chinese character using perl?

那年仲夏 提交于 2019-11-27 07:21:40

问题


Is there any way to detect Chinese characters using Perl? And is there any way on how to split Chinese characters with symbol dot '.' perfectly?


回答1:


Depends on your particular notion of what is a Chinese character. Perhaps you're looking for /\p{Script=Hani}/, but if we want to cast our net wide, the following regex pattern will match stuff that occurs in Chinese writing. Restrict if necessary.

use 5.014;
/
    (?: \p{Block=CJK_Compatibility}
    |   \p{Block=CJK_Compatibility_Forms}
    |   \p{Block=CJK_Compatibility_Ideographs}
    |   \p{Block=CJK_Compatibility_Ideographs_Supplement}
    |   \p{Block=CJK_Radicals_Supplement}
    |   \p{Block=CJK_Strokes}
    |   \p{Block=CJK_Symbols_And_Punctuation}
    |   \p{Block=CJK_Unified_Ideographs}
    |   \p{Block=CJK_Unified_Ideographs_Extension_A}
    |   \p{Block=CJK_Unified_Ideographs_Extension_B}
    |   \p{Block=CJK_Unified_Ideographs_Extension_C}
    )
/x;

Yes, . matches one character. The empty pattern for split DWYM:

use utf8;
split //, '冰淇淋'
# returns ('冰', '淇', '淋')


来源:https://stackoverflow.com/questions/6937087/detect-chinese-character-using-perl

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!