Foreign language characters in Regular expression in C#

雨燕双飞 提交于 2019-12-18 12:00:04

问题


In C# code, I am trying to pass chinese characters: " 中文ABC123".

When I use alphanumeric in general using "^[a-zA-Z0-9\s]+$",

it doesn't pass for "中文ABC123" and regex validation fails.

What other expressions do I need to add for C#?


回答1:


To match any letter character from any language use:

\p{L}

If you also want to match numbers:

[\p{L}\p{Nd}]+

\p{L} ... matches a character of the unicode category letter.
                it is the short form for [\p{Ll}\p{Lu}\p{Lt}\p{Lm}\p{Lo}]
                  \p{Ll} ... matches lowercase letters. (abc)
                  \p{Lu} ... matches uppercase letters. (ABC)
                  \p{Lt} ... matches titlecase letters.
                  \p{Lm} ... matches modifier letters.
                  \p{Lo} ... matches letters without case. (中文)

\p{Nd} ... matches a character of the unicode category decimal digit.

Just replace: ^[a-zA-Z0-9\s]+$ with ^[\p{L}0-9\s]+$




回答2:


Thanks to @Andie2302 for pointing to the right way to do it.

In Addition, for many language in the world, it's still has the 'addition character' that require main character to generate it (ex. Thai word 'เก็บ' if use only \p{L} it will display only 'เกบ', you can see that some symbolic will be missing from the word).

That's why only \p{L} will not work for all foreign language.

So, you need to use code below, to support almost foreign language

\p{L}\p{M}

NOTE:

L stand for 'Letter' (All letter from all language, but does not include the 'Mark')

M stand for 'Mark' (The 'Mark' cannot display alone, it require 'Letter' to display it)

In Addition that you need Number, use code below

\p{N}

NOTE:

N stand for 'Numeric'


Thanks to this website for very useful information

https://www.regular-expressions.info/unicode.html



来源:https://stackoverflow.com/questions/28156769/foreign-language-characters-in-regular-expression-in-c-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!