Image tag scraper regular expression

ぃ、小莉子 提交于 2019-12-25 03:21:36

问题


I'm really REALLY bad at regular expressions. It just hasn't clicked yet. I'm trying to make small application that extracts all image tags of their src, width, and height attributes. This is what I have so far:

<?php

function print_links ($url) 
{
    $fp = fopen($url, "r") or die("Could not contact $url");
    $page_contents = "";
    while ($new_text = fread($fp, 100)) {
        $page_contents .= $new_text;
    }


    $match_result = 
    preg_match_all( '/<img.*src=[\"\'](.*)[\"\'].*width=(\d+).*height=(\d+).*/>/i',
                $page_contents,
                $match_array, 
                PREG_SET_ORDER);

  echo "number matched is: $match_result<br><br> ";

  print_r($match_array);

  foreach ($match_array as $entry) {
   $tag = $entry[0];
   $src = $entry[1];
   $width = $entry[2];
   $height = $entry[3];
   print  (" <b>src</b>: $src; 
        <b>width</b>:  $width<br />
        <b>height</b>:  $height<br />
        <b>tag</b>:  $tag<br />"
        );
    }

}

print_links ("http://www.drudgereport.com/");

?>

but I get this little error:

Warning: preg_match_all(): Unknown modifier '>' in C:\Apache2.2\htdocs\it302\regex\regex.php on line 17 number matched is:

I'm not sure where I went wrong in my regexp. I've tried multiple things but have ended up just as confused.

Any suggestions?


回答1:


In your regex the last .*/> is wrong.

no / there...

/<img.*src=[\"\'](.*)[\"\'].*width=(\d+).*height=(\d+).*>/i

or \/? escape and make it optional...

/<img.*src=[\"\'](.*)[\"\'].*width=(\d+).*height=(\d+).*\/?>/i

but this regex only works if src width height are in this given order within the img tag and width and height also allow quoted values and units. e.g. width="0.9em" is valid html...
this are all reasons why you should not use regex to parse html (and many more...)




回答2:


Do not use regex for this. Especially if you are REALLY bad :)

http://simplehtmldom.sourceforge.net/

foreach($html->find('img') as $element){
   $src = $element->src;
   $width = $element->width;
   $height = $element->height;
   print  (" <b>src</b>: $src; 
        <b>width</b>:  $width<br />
        <b>height</b>:  $height<br />
        <b>tag</b>:  $tag<br />"
        );
   }


来源:https://stackoverflow.com/questions/5561166/image-tag-scraper-regular-expression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!