Is there a native \"PHP way\" to parse command arguments from a string
? For example, given the following string
:
foo \"bar \\\"baz
Based on HamZa's answer:
function parse_cli_args($cmd) {
preg_match_all('#(?<!\\\\)("|\')(?<escaped>(?:[^\\\\]|\\\\.)*?)\1|(?<unescaped>\S+)#s', $cmd, $matches, PREG_SET_ORDER);
$results = [];
foreach($matches as $array){
$results[] = !empty($array['escaped']) ? $array['escaped'] : $array['unescaped'];
}
return $results;
}
I would recommend going another way. There is already a "standard" way of doing command line arguments. it's called get_opts:
http://php.net/manual/en/function.getopt.php
I would suggest that you change your script to use get_opts, then anyone using your script will be passing parameters in a way that is familiar to them and kind of "industry standard" instead of having to learn your way of doing things.
Regexes are quite powerful: (?s)(?<!\\)("|')(?:[^\\]|\\.)*?\1|\S+
. So what does this expression mean ?
(?s)
: set the s
modifier to match newlines with a dot .
(?<!\\)
: negative lookbehind, check if there is no backslash preceding the next token("|')
: match a single or double quote and put it in group 1(?:[^\\]|\\.)*?
: match everything not \, or match \ with the immediately following (escaped) character\1
: match what is matched in the first group|
: or\S+
: match anything except whitespace one or more times.The idea is to capture a quote and group it to remember if it's a single or a double one. The negative lookbehinds are there to make sure we don't match escaped quotes. \1
is used to match the second pair of quotes. Finally we use an alternation to match anything that's not a whitespace. This solution is handy and is almost applicable for any language/flavor that supports lookbehinds and backreferences. Of course, this solution expects that the quotes are closed. The results are found in group 0.
Let's implement it in PHP:
$string = <<<INPUT
foo "bar \"baz\"" '\'quux\''
'foo"bar' "baz'boz"
hello "regex
world\""
"escaped escape\\\\"
INPUT;
preg_match_all('#(?<!\\\\)("|\')(?:[^\\\\]|\\\\.)*?\1|\S+#s', $string, $matches);
print_r($matches[0]);
If you wonder why I used 4 backslashes. Then take a look at my previous answer.
Output
Array
(
[0] => foo
[1] => "bar \"baz\""
[2] => '\'quux\''
[3] => 'foo"bar'
[4] => "baz'boz"
[5] => hello
[6] => "regex
world\""
[7] => "escaped escape\\"
)
Online regex demo Online php demo
Removing the quotes
Quite simple using named groups and a simple loop:
preg_match_all('#(?<!\\\\)("|\')(?<escaped>(?:[^\\\\]|\\\\.)*?)\1|(?<unescaped>\S+)#s', $string, $matches, PREG_SET_ORDER);
$results = array();
foreach($matches as $array){
if(!empty($array['escaped'])){
$results[] = $array['escaped'];
}else{
$results[] = $array['unescaped'];
}
}
print_r($results);
Online php demo
If you want to follow the rules of such parsing that are there as well as in shell, there are some edge-cases which I think aren't easy to cover with regular expressions and therefore you might want to write a method that does this (example):
$string = 'foo "bar \"baz\"" \'\\\'quux\\\'\'';
echo $string, "\n";
print_r(StringUtil::separate_quoted($string));
Output:
foo "bar \"baz\"" '\'quux\''
Array
(
[0] => foo
[1] => bar "baz"
[2] => 'quux'
)
I guess this pretty much matches what you're looking for. The function used in the example can be configured for the escape character as well as for the quotes, you can even use parenthesis like [
]
to form a "quote" if you like.
To allow other than native bytesafe-strings with one character per byte you can pass an array instead of a string. the array needs to contain one character per value as a binary safe string. e.g. pass unicode in NFC form as UTF-8 with one code-point per array value and this should do the job for unicode.
I wrote some packages for console interactions:
Arguments parsing
There is a package that does the whole arguments parsing thing weew/php-console-arguments
Example:
$parser = new ArgumentsParser();
$args = $parser->parse('command:name arg1 arg2 --flag="custom \"value" -f="1+1=2" -vvv');
$args
will be an array:
['command:name', 'arg1', 'arg2', '--flag', 'custom "value', '-f', '1+1=2', '-v', '-v', '-v']
Arguments can be grouped:
$args = $parser->group($args);
$args
will become:
['arguments' => ['command:name', 'arg1', 'arg2'], 'options' => ['--flag' => 1, '-f' => 1, '-v' => 1], '--flag' => ['custom "value'], '-f' => ['1+1=2'], '-v' => []]
It can do much more, just check the readme.
Output styling
You might need a package for output styling weew/php-console-formatter
Console application
Packages above can be used standalone or in combination with a fancy console application skeleton weew/php-console
Note: This solutions are not native but might still be useful to some people.