问题
In my application client is uploading data from MS word to Textarea. My RegEx skills are not so good :)
I need a RegEx to filter all the junk characters from string and the only acceptable input is characters from keyboard. i.e, A-Z, a-z, 0-9 and all the special chracters present on keyboard + all currency symbols.
EDIT: I want to allow only ascii codes including extended. http://www.asciitable.com/
回答1:
I have checked the ASCII table and all printable symbols it contains are present on any standard keyboard.
It's hard to tell what defines "special characters present on the keyboard" but I assume you mean printable non-alphanumeric characters. While all the unicode whitespace characters (non-braking space, zero-width word non-joiner...) are indeed "special", they are absent from most keyboards. The backspace character, while present on most keyboards, is typically interpreted by the OS, so I assume you don't want that. A similar argument applies to the tab key: while the tab character is easier to obtain than the newline character, it can't normally be typed into a form input.
Concerning currency symbols, the character class \p{Sc} covers them, and C# regex seems to support this class
Non-US keyboards contain many more characters (symbols with diacritics, cyrillic, chinese/japanese/korean characters), but they don't match your description of "A-Z, a-z, 0-9 and all the special chracters present on keyboard + all currency symbols". Of special interest is the japanese end-of-sentence punctuation, which is a hollow circle instead of just a dot. However, while it matches your description, I believe you don't want that either.
C# also supports \p{isBasicLatin}, but that includes the ASCII control characters, which I assume you don't want.
To sum up: your description matches the entire printable ASCII range and the newline \n. To check a string is made out of these, use this regex:
^[\x20-\x7E\n\p{Sc}]$
Reflecting your edit, also consider all printable ASCII characters (most currency symbols are absent, $ isn't) + newline
^[\x20-\x7E\n]$
or the entire ASCII range including the control characters and all ASCII whitespace:
^[\x00-\x7F]$
^[\p{isBasicLatin}]$
Ref:
MSDN character classes
MSDN character escapes
MSDN code example (adapted here):
bool IsValid(string strIn) { // Return true if strIn is in valid format. return Regex.IsMatch(strIn, @"^[\x20-\x7E\n\p{Sc}]$"); }
regex replace (adapted here; strips out everything except A-Z, a-z , 0-9 and following characters. ~ ` ! @ # $ % ^ & * ( ) _ + | - = \ { } [ ] : " ; ' < > ? , . /)
String CleanInput(string strIn) { // Replace invalid characters with empty strings. return Regex.Replace(strIn, @"[^a-zA-Z0-9`!@#$%^&*()_+|\-=\\{}\[\]:"";'<>?,./]", ""); }
Concerning double quotes inside verbatim string literals: http://blogs.msdn.com/b/gusperez/archive/2005/08/10/450257.aspx
来源:https://stackoverflow.com/questions/14681479/need-regex-to-filter-anything-except-keyboard-characters