Extract PHP Code with Regular Expressions

空扰寡人 提交于 2021-02-10 13:26:10

问题


I want extract the entire PHP Code of this section with Regular Expressions:

<h1>Extract the PHP Code</h1>
    <?php
        echo(date("F j, Y, g:i a") . ' and a stumbling block: ?>');
        /* Another stumbling block ?> */
        echo(' that works.');
    ?>
<p>Some HTML text ...</p>

Unfortunately, my Regular Expression got stuck on the stumbling block:

/<[?]php[^?>]*[?]>/gim

Does someone have a hint how to capture the full PHP Code?


回答1:


Something like this might work

/<\?php.+?\?>$/ms

Regular expression visualization

This pattern uses two flags

  • m for PCRE_MULTILINE

    By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless D modifier is set). This is the same as Perl. When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m modifier. If there are no "\n" characters in a subject string, or no occurrences of ^ or $ in a pattern, setting this modifier has no effect.

  • s for PCRE_DOTALL

    If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

Here's what a couple matches would look like

enter image description here


Caveat it doesn't work if it can't find ?> at the end of a line.

So it works in the case of

  • ?>');
  • ?> */

But it wouldn't work for

<?php
  echo "actual code";
  /*
   * comment ?>
   */
?>

Story short, if your code is that messy, you need a better solution. If your code is clean, it should work just fine.




回答2:


You can try with this pattern:

$pattern = <<<'LOD'
~

#definitions
(?(DEFINE)
    (?<sq> '(?>[^'\\]+|\\.)*+(?>'|\z) ) # content inside simple quotes
    (?<dq> "(?>[^"\\]+|\\.)*+(?>"|\z) ) # content inside double quotes
    (?<vn> [a-zA-Z_]\w*+ ) # variable name
    (?<crlf> \r?\n ) # CRLF
    (?<hndoc> <<< (["']?) (\g<vn>) \g{-2} \g<crlf> # content inside here/nowdoc
              (?> [^\r\n]+ | \R+ (?!\g{-1}; $) )*+
              (?: \g<crlf> \g{-1}; \g<crlf> | \z )
    )
    (?<cmt> /\*                      # multiline comments
             (?> [^*]+ | \* (?!/) )*+
             \*/
    )
)

#pattern
<\?php \s+
(?> [^"'?/<]+ | \?+(?!>) | \g<sq> | \g<dq> | \g<hndoc> | \g<cmt> | [</]+ )*+
(?: \?> | \z )

~xsm
LOD;

Test:

$subject = <<<'LOD'
<h1>Extract the PHP Code</h1>
    <?php
        echo(date("F j, Y, g:i a") . ' and a stumbling block: ?>');
        /* Another stumbling block ?> */
        echo <<<'EOD'
    Youpi!!! ?>
EOD;
        echo(' that works.');
    ?>
<p>Some HTML text ...</p>
LOD;

preg_match_all($pattern, $subject, $matches);

print_r($matches);


Another way:

As mario suggests it in a comment, you can use the tokenizer. It's the most easy way to do that since you don't have to define anything, example:

$tokens = token_get_all($subject);
$display = false;
foreach ($tokens as $token) {
    if (is_array($token)) {
        if ($token[0]==T_OPEN_TAG) $display = true;
        if ($display) echo $token[1];
        if ($token[0]==T_CLOSE_TAG) $display = false;
    } else {
        if ($display) echo $token;
    }
}


来源:https://stackoverflow.com/questions/18431312/extract-php-code-with-regular-expressions

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!