Need RegEx to filter anything except keyboard characters

試著忘記壹切 提交于 2021-01-29 03:44:33

问题


In my application client is uploading data from MS word to Textarea. My RegEx skills are not so good :)

I need a RegEx to filter all the junk characters from string and the only acceptable input is characters from keyboard. i.e, A-Z, a-z, 0-9 and all the special chracters present on keyboard + all currency symbols.

EDIT: I want to allow only ascii codes including extended. http://www.asciitable.com/


回答1:


I have checked the ASCII table and all printable symbols it contains are present on any standard keyboard.

It's hard to tell what defines "special characters present on the keyboard" but I assume you mean printable non-alphanumeric characters. While all the unicode whitespace characters (non-braking space, zero-width word non-joiner...) are indeed "special", they are absent from most keyboards. The backspace character, while present on most keyboards, is typically interpreted by the OS, so I assume you don't want that. A similar argument applies to the tab key: while the tab character is easier to obtain than the newline character, it can't normally be typed into a form input.

Concerning currency symbols, the character class \p{Sc} covers them, and C# regex seems to support this class

Non-US keyboards contain many more characters (symbols with diacritics, cyrillic, chinese/japanese/korean characters), but they don't match your description of "A-Z, a-z, 0-9 and all the special chracters present on keyboard + all currency symbols". Of special interest is the japanese end-of-sentence punctuation, which is a hollow circle instead of just a dot. However, while it matches your description, I believe you don't want that either.

C# also supports \p{isBasicLatin}, but that includes the ASCII control characters, which I assume you don't want.

To sum up: your description matches the entire printable ASCII range and the newline \n. To check a string is made out of these, use this regex:

^[\x20-\x7E\n\p{Sc}]$

Reflecting your edit, also consider all printable ASCII characters (most currency symbols are absent, $ isn't) + newline

^[\x20-\x7E\n]$

or the entire ASCII range including the control characters and all ASCII whitespace:

^[\x00-\x7F]$
^[\p{isBasicLatin}]$

Ref:
MSDN character classes
MSDN character escapes
MSDN code example (adapted here):

bool IsValid(string strIn)
{
    // Return true if strIn is in valid format.
    return Regex.IsMatch(strIn, @"^[\x20-\x7E\n\p{Sc}]$");

}

regex replace (adapted here; strips out everything except A-Z, a-z , 0-9 and following characters. ~ ` ! @ # $ % ^ & * ( ) _ + | - = \ { } [ ] : " ; ' < > ? , . /)

String CleanInput(string strIn)
{
    // Replace invalid characters with empty strings.
    return Regex.Replace(strIn,
          @"[^a-zA-Z0-9`!@#$%^&*()_+|\-=\\{}\[\]:"";'<>?,./]", ""); 
}

Concerning double quotes inside verbatim string literals: http://blogs.msdn.com/b/gusperez/archive/2005/08/10/450257.aspx



来源:https://stackoverflow.com/questions/14681479/need-regex-to-filter-anything-except-keyboard-characters

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!