问题
My PHP code looks like this:
$input = "City.name = 'New York'"; $literal_pattern = '/\'.[^\']*\'/'; preg_match($literal_pattern, $input, $token); echo $token[0]; // prints 'New York'
My regex needs to grab literals with escaped single quotes like:
$input = "City.name = 'New \' York'"; $literal_pattern = ???????????; preg_match($literal_pattern, $input, $token); echo $token[0]; // should prints 'New \' York'
What wil be the reges for $literal_pattern ?
回答1:
Without this condition, simple...
/('[^']*')/
...would suffice, of course: match all sequences of "single quote, followed by any number of non-single-quote symbols, followed by a single quote again".
But as we need to be ready for two things here - both "normal" and "escaped" ones. So we should add some spice to our pattern:
/('[^'\\]*(?:\\.[^'\\]*)*')/
It might look odd (and it is), but it's actually pretty simple too: match sequences of...
- single quote symbol...
- ...followed by zero or more "normal" characters (not
'
or\
), - ...followed by a subexpression of ("escaped" symbol, then zero or more "normal" ones), repeated 0 or more times...
- followed by a single quote symbol.
Example:
$input = "City.name = 'New \\' York (And Some Backslash Fun)\\\\'\\'";
# ...as \' in any string literal will be parsed as a _single_ quote
$pattern = "/('[^'\\\\]*(?:\\\\.[^'\\\\]*)*')/";
# ... a choice: escape either slashes or single quotes; I choose the former
preg_match($pattern, $input, $token);
echo $token[0]; // 'New \' York (And Some Backslash Fun)\\'
回答2:
This is the regex you look for: /\'(\\.|[^\'\\])*\'/
In PHP, this would look like $literal_pattern = '/(\'(?:\\.|[^\'\\])*\')/';
回答3:
Regex is automatically greedy, so it will catch as much data as it can using the literal. So, if you recognize "everything between '
s", it will catch anything between the first and last '
.
Thus, you can safely do this:
$literal_pattern = "#('.*')#";
Example: http://ideone.com/gI5bXs
NB: As @m.buettner pointed out, this method will only work if there is one '
-encased string in your input.
回答4:
You could use negative lookbehind matching. http://www.regular-expressions.info/lookaround.html
(?<!a)b
matches a "b" that is not preceded by an "a", using negative lookbehind
The only thing is that I'm pretty sure that PHP regexes don't support that. If they were supported, the regex would look something like this:
/(?<!\\)'(.*?)(?<!\\)'/
My advice would be to use a simple parser. Here's something I just made up off the top of my head (obviously in pseudocode): no guarantees its logic will work for your purposes, but it's actually not too hard to build yourself.
let inString = false
let escaping = false
let match = ''
for each letter in string
if letter == "\\" and not escaping
escaping = true
else
if letter == "'" and not escaping
inString = not inString
else if inString
match += letter
escaping = false
return match
来源:https://stackoverflow.com/questions/13261250/regex-pattern-for-matching-single-quoted-words-in-a-string-and-ignore-the-escape