问题
Here's what I'd like to do - in Php: given a string, have a result like this:
(a()?b|c)a is a function that returns true of false. This should giveborcafter callinga()(a()?(b()?d|e)|c). Same principle. The final result should bed,eorc(a()?(b()?d|e)|(c()?f|g)). Same principle. The final result should bed,e,forg
The problem I'm facing is that a (in my previous examples) can be an expression too, like this:
((h() ? a | i) ? (b() ? d | e) | (c() ? f | g))
I'm trying to use a regexp, to do it but this doesn't work.
$res=preg_match_all('/\([^.\(]+\)/', $str, $matches);
So in the end I would like do call my function like this:
$final_string=compute("(a(x(y(z()?o|p)))?(b()?d|e)|(c()?f|g))");
The final result in $final_string should be d, e, f or g
I'm pretty sure something has been done before but cant find it on google. How would you do?
To be more precise, here's how I'd like to have the string to be analyzed:
$str =
"
(myfunction(12684444)
? {* comment *}
(
myfunction(1)|
myfunction(2)|
myfunction(80)|
myfunction(120)|
myfunction(184)|
myfunction(196)
? {* comment *}
AAAAA
{* /comment *}
|
{* Ignore all other values: *}
BBBBB
) {* /comment *}
| {* comment *}
CCCC
)";
回答1:
I believe you're looking for something like this. Explanatory comments interspersed along the way.
If you're going to extend the grammar much beyond what you've got (really even if you're not) write a proper parser rather than try to do everything in a single regex. This has been a fun exercise and shows some of the power of PCRE, but it can very easily become an unmaintainable mess.
Test Strings:
$tests = [
"a",
"a()",
"a(b)",
"(a?b|c)",
"(a()?(b()?d|e)|(c()?f|g))",
"((h() ? a | i) ? (b() ? d | e) | (c() ? f | g))",
"(a(d(f))?b(e(f))|c)"
];
For use later.
Regex:
$regex = <<<'REGEX'
/
(?(DEFINE)
# An expression is any function, ternary, or string.
(?<expression>
(?&function) | (?&ternary) | (?&string)
)
)
^(?<expr>
# A function is a function name (consisting of one or more word characters)
# followed by an opening parenthesis, an optional parameter (expression),
# and a closing parenthesis.
# Optional space is allowed around the parentheses.
(?<function>
(?<func_name> \w+ )
\s*\(\s*
(?<parameter> (?&expression)? )
\s*\)\s*
)
|
# A ternary is an opening parenthesis followed by an 'if' expression,
# a question mark, an expression evaluated when the 'if' is true,
# a pipe, an expression evaluated when the 'if' is false, and a closing
# parenthesis.
# Whitespace is allowed after '('; surrounding '?' and '|'; and before ')'.
(?<ternary>
\(\s*
(?<if> (?&expression) )
\s*\?\s*
(?<true> (?&expression) )
\s*\|\s*
(?<false> (?&expression) )
\s*\)
)
|
# A string, for simplicity's sake here, we'll call a sequence of word
# characters.
(?<string> \w+ )
)$
/x
REGEX;
Liberal use of named capturing groups helps a lot, as does the x (PCRE_EXTENDED) modifier to allow comments and whitespace. The (?(DEFINE)...) block lets you to define subpatterns for use by reference only.
Regex Demo:
foreach ($tests as $test) {
if (preg_match($regex, $test, $m)) {
echo "expression: $m[expr]\n";
if ($m['function']) {
echo "function: $m[function]\n",
"function name: $m[func_name]\n",
"parameter: $m[parameter]\n";
} elseif ($m['ternary']) {
echo "ternary: $m[ternary]\n",
"if: $m[if]\n",
"true: $m[true]\n",
"false: $m[false]\n";
} else {
echo "string: $m[string]\n";
}
echo "\n";
}
}
Output:
expression: a
string: a
expression: a()
function: a()
function name: a
parameter:
expression: a(b)
function: a(b)
function name: a
parameter: b
expression: (a?b|c)
ternary: (a?b|c)
if: a
true: b
false: c
expression: (a()?(b()?d|e)|(c()?f|g))
ternary: (a()?(b()?d|e)|(c()?f|g))
if: a()
true: (b()?d|e)
false: (c()?f|g)
expression: ((h() ? a | i) ? (b() ? d | e) | (c() ? f | g))
ternary: ((h() ? a | i) ? (b() ? d | e) | (c() ? f | g))
if: (h() ? a | i)
true: (b() ? d | e)
false: (c() ? f | g)
expression: (a(d(f))?b(e(f))|c)
ternary: (a(d(f))?b(e(f))|c)
if: a(d(f))
true: b(e(f))
false: c
A little verbose, but demonstrates well enough what's matching.
Example compute() Function:
function compute($expr) {
$regex = '/.../x'; // regex from above
if (!preg_match($regex, $expr, $m)) {
return false;
}
if ($m['function']) {
if ($m['parameter']) {
return $m['func_name'](compute($m['parameter']));
} else {
return $m['func_name']();
}
}
if ($m['ternary']) {
return compute($m['if']) ? compute($m['true']) : compute($m['false']);
}
return $m['string'];
}
Pretty straight-forward - execute matched functions, evaluate matched ternary expressions, or return matched strings; recursing where appropriate.
compute() demo:
function a() {return true;}
function b() {return false;}
function d() {return true;}
function e() {return false;}
function h() {return true;}
foreach ($tests as $test) {
$result = compute($test);
echo "$test returns: ";
var_dump($result);
}
Output:
a returns: string(1) "a"
a() returns: bool(true)
a(b) returns: bool(true)
(a?b|c) returns: string(1) "b"
(a()?(b()?d|e)|(c()?f|g)) returns: string(1) "e"
((h() ? a | i) ? (b() ? d | e) | (c() ? f | g)) returns: string(1) "e"
(a(d(f))?b(e(f))|c) returns: bool(false)
I'm pretty sure this is correct.
回答2:
Expanding on @PaulCrovella regex here a bit.
This will allow any level off nested parenthesis around any expression and be parsed
and trimmed accordingly. Whitespace is also trimmed.
PHP Sample:
$rx =
'/
^
(?<found> # (1 start)
\h*
(?:
# A function is a function name (consisting of one or more word characters)
# followed by an opening parenthesis, an optional parameter (expression),
# and a closing parenthesis.
(?<function> # (2 start)
(?>
(?<func_name> \w+ ) # (3)
\h* \(
(?<parameter> # (4 start)
(?&expression)
|
) # (4 end)
\h* \)
)
) # (2 end)
|
# A ternary is an opening \'if\' expression,
# a question mark, an expression evaluated when the \'if\' is true,
# a pipe, an expression evaluated when the \'if\' is false.
(?<pt> \( )? # (5)
(?<ternary> # (6 start)
(?>
(?<if> # (7 start)
(?&expression)
) # (7 end)
\h* \? \h*
(?<true> # (8 start)
(?&expression)
) # (8 end)
\h* \| \h*
(?<false> # (9 start)
(?&expression)
) # (9 end)
)
) # (6 end)
(?(\'pt\') \h* \) )
|
# A string, for simplicity\'s sake here, we\'ll call a sequence of word
# characters.
(?<string> # (10 start)
(?> \w+ )
) # (10 end)
|
(?<parens> # (11 start)
(?>
\( \h*
(?<parens_core> # (12 start)
\h*
(?&p_expression)
) # (12 end)
\h* \)
)
) # (11 end)
)
) # (1 end)
\h*
$
(?(DEFINE)
# expression is any function, parenthesized-ternary, or string.
(?<expression> # (13 start)
\h*
(?:
(?&function)
| \( (?&ternary) \h* \)
| (?&string)
| (?&parens)
)
) # (13 end)
# p_expression is any parenthesized - function, ternary, or string.
(?<p_expression> # (14 start)
\h*
(?:
(?&function)
| (?= . )
(?&ternary)
| (?&string)
| (?&parens)
)
) # (14 end)
)
/x';
function compute($expr) {
global $rx;
if (!preg_match($rx, $expr, $m)) {
return false;
}
if ($m['function']) {
if ($m['parameter']) {
return $m['func_name'](compute($m['parameter']));
} else {
return $m['func_name']( '' );
}
}
if ($m['ternary']) {
return compute($m['if']) ? compute($m['true']) : compute($m['false']);
}
if ($m['parens']) {
return compute($m['parens_core']);
}
return $m['string'];
}
function a() {return true; }
function b() {return false;}
function d() {return true;}
function e() {return false;}
function h() {return true;}
function intro($p) {if ($p) return 'intro'; return false;}
function type($p) {if ($p) return 'type'; return false;}
function insist($p) {if ($p) return 'insist'; return false;}
$tests = array(
"a",
"a()",
"a(b)",
"(a?b|c)",
"(a()?(b()?d|e)|(c()?f|g))",
"((h() ? a | i) ? (b() ? d | e) | (c() ? f | g))",
"(a(d(f))?b(e(f))|c)",
"------------",
"a?b|c",
"(a?b|c)",
" ( ( ( ( ( a ) ) ? ( ( b ) ) | ( ( c ) ) ) ) ) ",
"b( (oo ? p |u) ) ? x | y",
"a ?b() | c",
" a? ( b ? t | r) | d ",
"a()",
"a? (bhh ) |(c)",
"(a) ? ((b(oo) ? x | y )) | (c)",
"a(((b)))",
"a? (bhh ) |((c))",
"(a()?(b()?d|e)|(c()?f|g))",
"((h() ? a | i) ? (b() ? d | e) | (c() ? f | g))",
"(((h() ? a | i) ? (b() ? d | e) | (c() ? f | g)))",
"((h() ? a | i) ? (b() ? d | e) | (c() ? f | g))",
"(a(d(f))?b(e(f))|c)",
"------------",
"((h() ? a | i) ? (b() ? d | e) | (c() ? f | g))",
"(a(d(f))?b(e(f))|c)",
'(intro(intro(type(insist(poou))))?toutou|tutu)',
'type()intro(intro(type(insist(poou))))?type()|tutu'
);
foreach ($tests as $test) {
$result = compute($test);
echo "$test returns: ";
var_dump($result);
}
Output:
a() returns: bool(true)
a(b) returns: bool(true)
(a?b|c) returns: string(1) "b"
(a()?(b()?d|e)|(c()?f|g)) returns: string(1) "e"
((h() ? a | i) ? (b() ? d | e) | (c() ? f | g)) returns: string(1) "e"
(a(d(f))?b(e(f))|c) returns: bool(false)
------------ returns: bool(false)
a?b|c returns: string(1) "b"
(a?b|c) returns: string(1) "b"
( ( ( ( ( a ) ) ? ( ( b ) ) | ( ( c ) ) ) ) ) returns: string(1) "b"
b( (oo ? p |u) ) ? x | y returns: string(1) "y"
a ?b() | c returns: bool(false)
a? ( b ? t | r) | d returns: string(1) "t"
a() returns: bool(true)
a? (bhh ) |(c) returns: string(3) "bhh"
(a) ? ((b(oo) ? x | y )) | (c) returns: bool(false)
a(((b))) returns: bool(true)
a? (bhh ) |((c)) returns: string(3) "bhh"
(a()?(b()?d|e)|(c()?f|g)) returns: string(1) "e"
((h() ? a | i) ? (b() ? d | e) | (c() ? f | g)) returns: string(1) "e"
(((h() ? a | i) ? (b() ? d | e) | (c() ? f | g))) returns: string(1) "e"
((h() ? a | i) ? (b() ? d | e) | (c() ? f | g)) returns: string(1) "e"
(a(d(f))?b(e(f))|c) returns: bool(false)
------------ returns: bool(false)
((h() ? a | i) ? (b() ? d | e) | (c() ? f | g)) returns: string(1) "e"
(a(d(f))?b(e(f))|c) returns: bool(false)
(intro(intro(type(insist(poou))))?toutou|tutu) returns: string(6) "toutou"
type()intro(intro(type(insist(poou))))?type()|tutu returns: bool(false)
回答3:
Here's my "basic" version that works and does the job. It's "half" recursive (a loop that may call the function or not) and the improvements I'm planning to do (handle "+" separator to "add" returns of two functions, and handle "=" to set variables to make short aliases of the value of a return function) seem quite easy to implement in the _compute() function... maybe because I wrote the code myself, and maybe because, like Paul Crovella said, I'm not using PCRE, because it can very easily become an unmaintainable mess...
NB: this code can be easily optimized, and it's not perfect (there are some cases it doesnt work like (a()+b()))... but if someone is willing to finish it he/she's welcome!
class Parser
{
private $ref = array(
'a' => array( 'type' => 'fn', 'val' => '_a'),
'b' => array( 'type' => 'fn', 'val' => '_b'),
'c' => array( 'type' => 'fn', 'val' => '_c'),
'd' => array( 'type' => 'fn', 'val' => '_d'),
'e' => array( 'type' => 'fn', 'val' => '_e'),
'f' => array( 'type' => 'fn', 'val' => '_f'),
'intro' => array( 'type' => 'fn', 'val' => '_getIntro'),
'insist' => array( 'type' => 'fn', 'val' => '_insist'),
'summoner_name' => array( 'type' => 'fn', 'val' => '_getSummonerName'),
'type' => array( 'type' => 'fn', 'val' => '_getEtat'),
' ' => array( 'type' => 'str', 'val' => ' ')
);
private function _a($p) { return 'valfnA'; }
private function _b($p) { return 'valfnB'; }
private function _c($p) { return 'valfnC'; }
private function _d($p) { return 'valfnD'; }
private function _e($p) { return 'valfnE'; }
private function _f($p) { return 'valfnF'; }
private function _getIntro($p) { return 'valGetIntro'; }
private function _insist($p) { return 'valInsist'; }
private function _getSummonerName($p) { return 'valGetSqmmonerName'; }
private function _getEtat($p) { return 'valGetEtat'; }
private function _convertKey($key, $params=false)
{
$retour = 'indéfini';
if (isset($this->ref[$key])) {
$val = $this->ref[$key];
switch ($val['type']) {
case 'fn':
$val=$val['val'];
if (method_exists($this, $val)) {
$retour = $this->$val($params);
}
break;
default:
if (isset($this->val['val'])) {
$retour = $this->val['val'];
}
break;
}
}
return $retour;
}
private function _compute($str)
{
$p=strpos($str, '?');
if ($p===false) {
$p=strpos($str, '=');
if ($p===false) {
return $str;
}
} else {
$or=strpos($str, '|');
if ($or===false) {
return false;
}
$s=substr($str,0,$p);
if (empty($s) || (strtolower($s)=='false')) {
return substr($str, $or+1);
}
return substr($str, $p+1, ($or-$p)-1);
}
return $str;
}
private function _getTexte($str, $i, $level)
{
if (empty($str)) {
return $str;
}
$level++;
$f = (strlen($str)-$i);
$val = substr($str, $i);
do {
$d = $i;
do {
$p=$d;
$d=strpos($str, '(', $p+1);
if (($p==$i) && ($d===false)) {
$retour = $this->_compute($str);
return $retour;
} elseif (($d===false) && ($p>$i)) {
$f=strpos($str, ')', $p+1);
if ($f===false) {
return false;
}
$d=$p;
while((--$d)>=$i) {
if (($str[$d]!=' ')
&& ($str[$d]!='_')
&& (!ctype_alnum($str[$d]))
) {
break;
}
}
if ($d>=$i) {
$d++;
} else {
$d=$i;
}
$val=substr($str, $d, ($f-$d)+1);
$fn=substr($str, $d, $p-$d);
$param=$this->_getTexte(
substr($str, $p+1, ($f-$p)-1), 0, $level+1
);
if (!empty($fn)) {
$val = $this->_convertKey($fn, $param);
} else {
$val = $this->_compute($param);
}
$str = substr($str, 0, $d).$val.substr($str, $f+1);
break;
} elseif ($d===false) {
break;
}
} while (true);
} while (true);
}
public function parse($str)
{
$retour=preg_replace('/{\*[^.{]+\*}/', '', $str); //}
$retour=str_replace("\n", "", $retour);
$retour=str_replace("\r", "", $retour);
while (strpos($retour, ' ')!==false) {
$retour=str_replace(" ", " ", $retour);
}
return trim($this->_getTexte($retour, 0, 0));
}
}
$p=new Parser();
$tests = [
"a",
"a()",
"a(b)",
"(a?b|c)",
"(a()?(b()?d|e)|(c()?f|g))",
"(a()?(b()?d|e)|(c()?f()|g))",
"((h() ? a | i) ? (b() ? d | e) | (c() ? f | g))",
"(a(d(f))?b(e(f))|c)",
'(intro(intro(type(insist(poou))))?toutou|tutu)',
'type()intro(intro(type(insist(poou))))?type()|tutu'
];
foreach ($tests as $test) {
$res=$p->parse($test);
echo $test.' = '.var_export($res,true)."\n";
}
来源:https://stackoverflow.com/questions/27462254/writing-a-simple-syntax-parser