问题
I've inherited some C# code with the following regular expression
Regex(@"^[a-zA-Z''-'\s]{1,40}$")
I understand this string except for the role of the single quotes. I've searched all over but can't seem to find an explanation. Any ideas?
回答1:
From what I can tell, the expression is redundant.
It matches a-z
or A-Z
, or the '
character, or anything between '
and '
(which of course is only the '
character again, or any whitespace.
I've tested this using RegexPal and it doesn't appear to match anything but these characters. Perhaps the sequence was generated by code, or it used to match a wider range of characters in an earlier version?
UPDATE: From your comments (matching a name), I'm gonna go ahead and guess the author thought (s)he was escaping a hyphen by putting it in quotes, and wasn't the most stellar software tester. What they probably meant was:
Regex(@"^[a-zA-Z'\-\s]{1,40}$") //Escaped the hyphen
Which could also be written as:
Regex(@"^[a-zA-Z'\s-]{1,40}$") //Put the hyphen at the end where it's not ambiguous
回答2:
The only way having the apostrophe / single quote three times makes sense is if the second and third instances are actually fancy curly single quotes such as ‘, ’, and ‛. If so a better (clearer) way to represent it would be to use the unicode escapes:
Regex(@"^[a-zA-Z'\u2018-\u201B\s]{1,40}$")
Incidentally some languages, such as PowerShell, explicitly allow these curly single quotes and treat them the same as the ASCII ' (0x27) character. From the PowerShell 2.0 Language Specification:
single-quote-character:
' (U+0027)
Left single quotation mark (U+2018)
Right single quotation mark (U+2019)
Single low-9 quotation mark (U+201A)
Single high-reversed-9 quotation mark (U+201B)
回答3:
As it is the three single quote characters are redundant. They represent the single quote character (#1) and the range of characters which both begins and ends at the single quote (#2 and #3 separated by a hyphen).
It looks like it is an error, the writer seems to have meant to include the hyphen character in the class by "escaping" it in single quotes. Without escaping it the hyphen represents a character range, like in a-z and A-Z.
I'm guessing the original author meant [a-zA-Z'\-\s]
回答4:
The extra apostrophes are redundant, so it doesn't make much sense. One possibility is that the author tried to escape the dash to include it in the pattern, but the correct way to do that would be to use a backslash:
Regex(@"^[a-zA-Z'\-\s]{1,40}$")
(Using apostrophes around a literal is for example used in custom format strings, where the author might have picked it up.)
来源:https://stackoverflow.com/questions/11854327/what-is-the-purpose-of-the-single-quotes-in-this-regex-expression