Regex pattern for matching single quoted words in a string and ignore the escaped single quotes

北城余情 提交于 2019-12-11 18:12:31

问题


My PHP code looks like this:

$input = "City.name = 'New York'";
$literal_pattern = '/\'.[^\']*\'/';
preg_match($literal_pattern, $input, $token);
echo $token[0]; // prints 'New York'

My regex needs to grab literals with escaped single quotes like:

$input = "City.name = 'New \' York'";
$literal_pattern = ???????????;
preg_match($literal_pattern, $input, $token);
echo $token[0]; // should prints 'New \' York'

What wil be the reges for $literal_pattern ?


回答1:


Without this condition, simple...

/('[^']*')/

...would suffice, of course: match all sequences of "single quote, followed by any number of non-single-quote symbols, followed by a single quote again".

But as we need to be ready for two things here - both "normal" and "escaped" ones. So we should add some spice to our pattern:

/('[^'\\]*(?:\\.[^'\\]*)*')/

It might look odd (and it is), but it's actually pretty simple too: match sequences of...

  • single quote symbol...
  • ...followed by zero or more "normal" characters (not ' or \),
  • ...followed by a subexpression of ("escaped" symbol, then zero or more "normal" ones), repeated 0 or more times...
  • followed by a single quote symbol.

Example:

$input   = "City.name = 'New \\' York (And Some Backslash Fun)\\\\'\\'"; 
# ...as \' in any string literal will be parsed as a _single_ quote

$pattern = "/('[^'\\\\]*(?:\\\\.[^'\\\\]*)*')/";
# ... a choice: escape either slashes or single quotes; I choose the former

preg_match($pattern, $input, $token);
echo $token[0]; // 'New \' York (And Some Backslash Fun)\\'



回答2:


This is the regex you look for: /\'(\\.|[^\'\\])*\'/

In PHP, this would look like $literal_pattern = '/(\'(?:\\.|[^\'\\])*\')/';




回答3:


Regex is automatically greedy, so it will catch as much data as it can using the literal. So, if you recognize "everything between 's", it will catch anything between the first and last '.

Thus, you can safely do this:

$literal_pattern = "#('.*')#";

Example: http://ideone.com/gI5bXs

NB: As @m.buettner pointed out, this method will only work if there is one '-encased string in your input.




回答4:


You could use negative lookbehind matching. http://www.regular-expressions.info/lookaround.html

(?<!a)b matches a "b" that is not preceded by an "a", using negative lookbehind

The only thing is that I'm pretty sure that PHP regexes don't support that. If they were supported, the regex would look something like this:

/(?<!\\)'(.*?)(?<!\\)'/

My advice would be to use a simple parser. Here's something I just made up off the top of my head (obviously in pseudocode): no guarantees its logic will work for your purposes, but it's actually not too hard to build yourself.

let inString = false
let escaping = false
let match = ''    
for each letter in string
    if letter == "\\" and not escaping
        escaping = true
    else
        if letter == "'" and not escaping
            inString = not inString
        else if inString
            match += letter
        escaping = false
return match


来源:https://stackoverflow.com/questions/13261250/regex-pattern-for-matching-single-quoted-words-in-a-string-and-ignore-the-escape

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!