Is there a native \"PHP way\" to parse command arguments from a string? For example, given the following string:
foo \"bar \\\"baz
I've worked out the following expression to match the various enclosures and escapement:
$pattern = <<
It matches:
Afterwards, you need to (carefully) remove the escaped characters:
$args = array();
foreach ($matches as $match) {
if (isset($match[3])) {
$args[] = $match[3];
} elseif (isset($match[2])) {
$args[] = str_replace(['\\\'', '\\\\'], ["'", '\\'], $match[2]);
} else {
$args[] = str_replace(['\\"', '\\\\'], ['"', '\\'], $match[1]);
}
}
print_r($args);
Update
For the fun of it, I've written a more formal parser, outlined below. It won't give you better performance, it's about three times slower than the regular expression mostly due its object oriented nature. I suppose the advantage is more academic than practical:
class ArgvParser2 extends StringIterator
{
const TOKEN_DOUBLE_QUOTE = '"';
const TOKEN_SINGLE_QUOTE = "'";
const TOKEN_SPACE = ' ';
const TOKEN_ESCAPE = '\\';
public function parse()
{
$this->rewind();
$args = [];
while ($this->valid()) {
switch ($this->current()) {
case self::TOKEN_DOUBLE_QUOTE:
case self::TOKEN_SINGLE_QUOTE:
$args[] = $this->QUOTED($this->current());
break;
case self::TOKEN_SPACE:
$this->next();
break;
default:
$args[] = $this->UNQUOTED();
}
}
return $args;
}
private function QUOTED($enclosure)
{
$this->next();
$result = '';
while ($this->valid()) {
if ($this->current() == self::TOKEN_ESCAPE) {
$this->next();
if ($this->valid() && $this->current() == $enclosure) {
$result .= $enclosure;
} elseif ($this->valid()) {
$result .= self::TOKEN_ESCAPE;
if ($this->current() != self::TOKEN_ESCAPE) {
$result .= $this->current();
}
}
} elseif ($this->current() == $enclosure) {
$this->next();
break;
} else {
$result .= $this->current();
}
$this->next();
}
return $result;
}
private function UNQUOTED()
{
$result = '';
while ($this->valid()) {
if ($this->current() == self::TOKEN_SPACE) {
$this->next();
break;
} else {
$result .= $this->current();
}
$this->next();
}
return $result;
}
public static function parseString($input)
{
$parser = new self($input);
return $parser->parse();
}
}
It's based on StringIterator to walk through the string one character at a time:
class StringIterator implements Iterator
{
private $string;
private $current;
public function __construct($string)
{
$this->string = $string;
}
public function current()
{
return $this->string[$this->current];
}
public function next()
{
++$this->current;
}
public function key()
{
return $this->current;
}
public function valid()
{
return $this->current < strlen($this->string);
}
public function rewind()
{
$this->current = 0;
}
}