问题
I'm really REALLY bad at regular expressions. It just hasn't clicked yet. I'm trying to make small application that extracts all image tags of their src, width, and height attributes. This is what I have so far:
<?php
function print_links ($url)
{
$fp = fopen($url, "r") or die("Could not contact $url");
$page_contents = "";
while ($new_text = fread($fp, 100)) {
$page_contents .= $new_text;
}
$match_result =
preg_match_all( '/<img.*src=[\"\'](.*)[\"\'].*width=(\d+).*height=(\d+).*/>/i',
$page_contents,
$match_array,
PREG_SET_ORDER);
echo "number matched is: $match_result<br><br> ";
print_r($match_array);
foreach ($match_array as $entry) {
$tag = $entry[0];
$src = $entry[1];
$width = $entry[2];
$height = $entry[3];
print (" <b>src</b>: $src;
<b>width</b>: $width<br />
<b>height</b>: $height<br />
<b>tag</b>: $tag<br />"
);
}
}
print_links ("http://www.drudgereport.com/");
?>
but I get this little error:
Warning: preg_match_all(): Unknown modifier '>' in C:\Apache2.2\htdocs\it302\regex\regex.php on line 17 number matched is:
I'm not sure where I went wrong in my regexp. I've tried multiple things but have ended up just as confused.
Any suggestions?
回答1:
In your regex the last .*/>
is wrong.
no /
there...
/<img.*src=[\"\'](.*)[\"\'].*width=(\d+).*height=(\d+).*>/i
or \/?
escape and make it optional...
/<img.*src=[\"\'](.*)[\"\'].*width=(\d+).*height=(\d+).*\/?>/i
but this regex only works if src width height are in this given order within the img tag and width and height also allow quoted values and units. e.g. width="0.9em" is valid html...
this are all reasons why you should not use regex to parse html (and many more...)
回答2:
Do not use regex for this. Especially if you are REALLY bad :)
http://simplehtmldom.sourceforge.net/
foreach($html->find('img') as $element){
$src = $element->src;
$width = $element->width;
$height = $element->height;
print (" <b>src</b>: $src;
<b>width</b>: $width<br />
<b>height</b>: $height<br />
<b>tag</b>: $tag<br />"
);
}
来源:https://stackoverflow.com/questions/5561166/image-tag-scraper-regular-expression