PHP : Extracting string between two tags by childs content [duplicate]

问题

I have this following html markup:

<ul>
    <li>
        <strong>Online:</strong>
        2/14/2010 3:40 AM
    </li>
    <li>
        <strong>Hearing Impaired:</strong>
        No
        </li>
    <li>
        <strong>Downloads:</strong>
        3,840
    </li>
</ul>

and I want to catch 3,840 from last li by "Downloads:".

What do you suggest ?

My attempt:

preg_match('/<li><strong>Downloads:<\/strong>(.*?)<\/li>/s', $s, $a);

回答1:

I suggest use an HTML Parser here, DOMDocument in particular with xpath.

Example:

$markup = '<ul>
    <li>
        <strong>Online:</strong>
        2/14/2010 3:40 AM
    </li>
    <li>
        <strong>Hearing Impaired:</strong>
        No
    </li>
    <li>
        <strong>Downloads:</strong>
        3,840
    </li>
</ul>';

$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);
// this just simply means get the string next on that strong tag with a text of Downloads:
$download = trim($xpath->evaluate("string(//strong[text()='Downloads:']/following-sibling::text())"));
echo $download; // 3,840

回答2:

Use a html parser for parsing html files. If you insist on regex then you could try the below,

<li>[^<>]*<strong>Downloads:<\/strong>\s*\K.*?(?=\s*<\/li>)

DEMO

Code:

$string = <<<EOT
<ul>
    <li>
        <strong>Online:</strong>
        2/14/2010 3:40 AM
    </li>
    <li>
        <strong>Hearing Impaired:</strong>
        No
    </li>
    <li>
        <strong>Downloads:</strong>
        3,840
    </li>
</ul>
EOT;
$regex = "~<li>[^<>]*<strong>Downloads:<\/strong>\s*\K.*?(?=\s*<\/li>)~s";
if (preg_match($regex, $string, $m)) {
    $yourmatch = $m[0]; 
    echo $yourmatch;
    } // 3,840

来源：https://stackoverflow.com/questions/26449506/php-extracting-string-between-two-tags-by-childs-content

标签

php

html

regex

html-parsing

domdocument