How can split on a comma except where it appears in brackets?

你说的曾经没有我的故事 提交于 2021-02-16 15:43:07

问题


I want to split like this:

Before:

TEST_A, TEST_B, TEST_C (with A, B, C), TEST_D

After:

TEST_A
TEST_B
TEST_C (with A, B, C)
TEST_D

How can I split it?


回答1:


Regex isn’t going to help this time, so you will have to iterate through the characters.

Fact is, regular expressions aren’t very context-aware. For that reason, you can’t use regular expression to parse HTML. This is why we’re better off iterating through the string ourselves.

function magic_split($str) {
    $sets = array('');  // Sets of strings
    $set_index = 0;     // Remember what index we’re writing to
    $brackets_depth = 0; // Keep track if we’re in brackets (or not)

    // Iterate through entire string
    for($i = 0; $i < strlen($str); $i++) {
        // Skip commas if we’re not in brackets
        if($brackets_depth < 1 && $str[$i] === ',') continue;

        // Add character to current list
        $sets[$set_index] .= $str[$i];

        // Store brackets depth
        if($str[$i] === '(') $brackets_depth++;
        if($str[$i] === ')') $brackets_depth--;

        if(
            $i < strlen($str) - 1 && // Is a next char available?
            $str[$i+1] === ',' &&   // Is it a comma?
            $brackets_depth === 0   // Are we not in brackets?
        ) $sets[++$set_index] = '';  // Add new set
    }

    return $sets;
}

$input = 'TEST_A, TEST_B, TEST_C (with A, B, C), TEST_D';
$split = magic_split($input);



回答2:


You want to match:

  • a word not containing opening parentheses, nor coma : [^(,]+
  • an expression between parenthesis: \([^(]+\)
    • or not... and without returning the match, so it becomes: (?:\([^(]+\))?)
  • a coma, followed by some space : ,[\s]*

PHP Code:

$ar=preg_split("#([^(,]+(?:\([^(]+\))?),[\s]*#", "$input,", -1,
            PREG_SPLIT_DELIM_CAPTURE |PREG_SPLIT_NO_EMPTY)

Edit: it does not work if you don't have coma outside the parenthesis. you'll have to add an extra coma after $input like modified above.




回答3:


The correct solution to this problem will depend on exactly what your specification is for identifying individual elements.

If you expect each one to begin with TEST_, then you could solve it fairly simply with a regular expression:

$input = 'TEST_A, TEST_B, TEST_C (with A, B, C), TEST_D';
$matches = preg_split('/,\s*(?=TEST_)/', $input);

var_dump($matches);

Output:

array(4) {
  [0]=>
  string(6) "TEST_A"
  [1]=>
  string(6) "TEST_B"
  [2]=>
  string(21) "TEST_C (with A, B, C)"
  [3]=>
  string(6) "TEST_D"
}

This splits the string on commas followed by whitespace, using a lookahead assertion test for the presence of TEST_ at the beginning of the next item.




回答4:


You merely need to explode on comma-space and disregard any comma-spaces that are inside of parentheses. (*SKIP)(*FAIL) will consume all parenthetical expressions and dispose of them so that they are not used as delimiters.

Code: (Demo)

$text = 'TEST_A, TEST_B, TEST_C (with A, B, C), TEST_D';

var_export(preg_split('~\([^)]*\)(*SKIP)(*FAIL)|, ~', $text));

Output:

array (
  0 => 'TEST_A',
  1 => 'TEST_B',
  2 => 'TEST_C (with A, B, C)',
  3 => 'TEST_D',
)


来源:https://stackoverflow.com/questions/28562514/how-can-split-on-a-comma-except-where-it-appears-in-brackets

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!