How to get span tag content using preg_match function? [closed]

谁说胖子不能爱 提交于 2019-12-02 06:44:42
Amal Murali

The Better Solution

A regex is the wrong tool here. HTML is not a regular language, and cannot be accurately parsed using regular expressions. Use a DOM parser instead. Not only is it much easier, it's more accurate and reliable, and won't break when the format of the markup changes in the future.

This is how you would get the contents inside a <span> tag using PHP's built-in DOMDocument class:

$dom = new DOMDocument;
$dom->loadHTML($yourHTMLString);
$result = $dom->getElementsByTagName('span')->item(0)->nodeValue;

If there are multiple tags, and you want to get the node values from all of them, you could simply use a foreach loop, like so:

$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('span') as $tag) {
    echo $tag->nodeValue . '<br/>';
}

And finally, to extract just the number from the node value, you have several options:

// Split on space, and get first part
echo explode(' ', $result, 2)[0]; 

// Replace everything that is not a digit or comma
echo preg_replace('/[^\d,]/', '', $result); 

// Get everything before the first space
echo strstr($result, ' ', 1);

// Remove everything after the first space
echo strtok($result, ' ');

All these statements will output 414,817. There's a whole host of string functions available for you to use, and you can choose one solution that suits your requirements.

The Regex-based solution

If you absolutely must use preg_match(), then you can use the following:

if (preg_match('#<span[^<>]*>([\d,]+).*?</span>#', $result, $matches)) {
    echo $matches[1];
}

[^<>]* means "match any number of characters except angled brackets", ensuring that we don't accidentally break out of the tag we're in.

.*? (note the ?) means "match any number of characters, but only as few as possible". This avoids matching from the first to the last <span> tag in the markup (if there are multiple <span>s).

I make absolutely no guarantees that the regex will always work, but it should be enough for those who want to finish up a one-off job. In such cases, it's probably better to go with a regex that works on sane things than weep about things not being universally perfect :)

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!