In PHP I have the following string :
$str = \"AAA, BBB, (CCC,DDD), \'EEE\', \'FFF,GGG\', (\'HHH\',\'III\'), ((\'JJJ\',\'KKK\'), LLL, (MMM,NNN)) , OOO\";
Instead of a preg_split, do a preg_match_all:
$str = "AAA, BBB, (CCC,DDD), 'EEE', 'FFF,GGG', ('HHH','III'), (('JJJ','KKK'), LLL, (MMM,NNN)) , OOO";
preg_match_all("/\((?:[^()]|(?R))+\)|'[^']*'|[^(),\s]+/", $str, $matches);
print_r($matches);
will print:
Array
(
[0] => Array
(
[0] => AAA
[1] => BBB
[2] => (CCC,DDD)
[3] => 'EEE'
[4] => 'FFF,GGG'
[5] => ('HHH','III')
[6] => (('JJJ','KKK'), LLL, (MMM,NNN))
[7] => OOO
)
)
The regex \((?:[^()]|(?R))+\)|'[^']*'|[^(),\s]+ can be divided in three parts:
\((?:[^()]|(?R))+\), which matches balanced pairs of parenthesis'[^']*' matching a quoted string[^(),\s]+ which matches any char-sequence not consisting of '(', ')', ',' or white-space charsA spartan regex that tokenizes and also validates all the tokens that it extracts:
\G\s*+((\((?:\s*+(?2)\s*+(?(?!\)),)|\s*+[^()',\s]++\s*+(?(?!\)),)|\s*+'[^'\r\n]*+'\s*+(?(?!\)),))++\))|[^()',\s]++|'[^'\r\n]*+')\s*+(?:,|$)
Regex101
Put it in string literal, with delimiter:
'/\G\s*+((\((?:\s*+(?2)\s*+(?(?!\)),)|\s*+[^()\',\s]++\s*+(?(?!\)),)|\s*+\'[^\'\r\n]*+\'\s*+(?(?!\)),))++\))|[^()\',\s]++|\'[^\'\r\n]*+\')\s*+(?:,|$)/'
ideone
The result is in capturing group 1. In the example on ideone, I specify PREG_OFFSET_CAPTURE flag, so that you can check against the last match in group 0 (entire match) whether the entire source string has been consumed or not.
\s. Consequently, it may not span multiple lines.(, ), ' or ,.'.,( and ends with ).() is not allowed.,. Single trailing comma , is considered valid.\s, which includes new line character) are arbitrarily allowed between token(s), comma(s) , separating tokens, and the bracket(s) (, ) of the bracket tokens.
\G\s*+
(
(
\(
(?:
\s*+
(?2)
\s*+
(?(?!\)),)
|
\s*+
[^()',\s]++
\s*+
(?(?!\)),)
|
\s*+
'[^'\r\n]*+'
\s*+
(?(?!\)),)
)++
\)
)
|
[^()',\s]++
|
'[^'\r\n]*+'
)
\s*+(?:,|$)