regexp split string by commas and spaces, but ignore the inside quotes and parentheses

前端 未结 2 709
栀梦
栀梦 2020-12-11 22:25

I need split string by commas and spaces, but ignore the inside quotes, single quotes and parentheses

$str = \"Questions, \\\"Quote\\\",\'single quote\',\'co         


        
相关标签:
2条回答
  • 2020-12-11 23:13

    Well, this works for the data you supplied:

    $rgx = <<<'EOT'
    /
      [,\s]++
      (?=(?:(?:[^"]*+"){2})*+[^"]*+$)
      (?=(?:(?:[^']*+'){2})*+[^']*+$)
      (?=(?:[^()]*+\([^()]*+\))*+[^()]*+$)
    /x
    EOT;
    

    The lookaheads assert that if there are any double-quotes, single-quotes or parentheses ahead of the current match position there's an even number of them, and the parens are in balanced pairs (no nesting allowed). That's a quick-and-dirty way to ensure that the current match isn't occurring inside a pair of quotes or parens.

    Of course, it assumes the input is well formed. But on the subject of of well-formedness, what about escaped quotes within quotes? What if you have quotes inside parens, or vice-versa? Would this input be legal?

    "not a \" quote", 'not a ) quote', (not ",' quotes)

    If so, you've got a much more difficult job ahead of you.

    0 讨论(0)
  • 2020-12-11 23:28

    This will work only for non-nested parentheses:

        $regex = <<<HERE
        /  "  ( (?:[^"\\\\]++|\\\\.)*+ ) \"
         | '  ( (?:[^'\\\\]++|\\\\.)*+ ) \'
         | \( ( [^)]*                  ) \)
         | [\s,]+
        /x
        HERE;
    
        $tags = preg_split($regex, $str, -1,
                             PREG_SPLIT_NO_EMPTY
                           | PREG_SPLIT_DELIM_CAPTURE);
    

    The ++ and *+ will consume as much as they can and give nothing back for backtracking. This technique is described in perlre(1) as the most efficient way to do this kind of matching.

    0 讨论(0)
提交回复
热议问题