regex help with getting tag content in PHP

感情迁移 提交于 2019-12-04 06:41:56

问题


so I have the code

function getTagContent($string, $tagname) {

    $pattern = "/<$tagname.*?>(.*)<\/$tagname>/";
    preg_match($pattern, $string, $matches);


    print_r($matches);

}

and then I call

$url = "http://www.freakonomics.com/2008/09/24/wall-street-jokes-please/";
$html = file_get_contents($url);
getTagContent($html,"title");

but then it shows that there are no matches, while if you open the source of the url there clearly exist a title tag....

what did I do wrong?


回答1:


try DOM

$url  = "http://www.freakonomics.com/2008/09/24/wall-street-jokes-please/";
$doc  = new DOMDocument();
$dom  = $doc->loadHTMLFile($url);
$items = $doc->getElementsByTagName('title');
for ($i = 0; $i < $items->length; $i++)
{
  echo $items->item($i)->nodeValue . "\n";
}



回答2:


The 'title' tag is not on the same line as its closing tag, so your preg_match doesn't find it.

In Perl, you can add a /s switch to make it slurp the whole input as though on one line: I forget whether preg_match will let you do so or not.

But this is just one of the reasons why parsing XML and variants with regexp is a bad idea.




回答3:


Probably because the title is spread on multiple lines. You need to add the option s so that the dot will also match any line returns.

$pattern = "/<$tagname.*?>(.*)<\/$tagname>/s";



回答4:


Have your php function getTagContent like this:

function getTagContent($string, $tagname) {
    $pattern = '/<'.$tagname.'[^>]*>(.*?)<\/'.$tagname.'>/is';
    preg_match($pattern, $string, $matches);
    print_r($matches);
}

It is important to use non-greedy match all .*? for matching text between start and end of tag and equally important is to use flags s for DOTALL (matches new line as well) and i for ignore case comparison.



来源:https://stackoverflow.com/questions/6331265/regex-help-with-getting-tag-content-in-php

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!