regex to print url from any webpage with specific word in url

淺唱寂寞╮ 提交于 2019-12-11 05:05:47

问题


i am using below code to extract url from a webpage and its working just fine but i want to filter it. it will display all urls in that page but i want only those url which consists of the word "super"

     $regex='|<a.*?href="(.*?)"|';
preg_match_all($regex,$result,$parts);
$links=$parts[1];
foreach($links as $link){
echo $link."<br>";

       }

so it should echo only uls where the word super is present. for example it should ignore url

       http://xyz.com/abc.html  

but it should echo

        http://abc.superpower.com/hddll.html

as it consists of the required word super in url


回答1:


Make your regex un-greedy and it should work:

$regex = '|<a.*?href="(.*?super[^"]*)"|is';

However to parse and scrap HTML it is better to use php's DOM parser.

Update: Here is code using DOM parser:

$request_url ='1900girls.blogspot.in/';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $request_url);    
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
$result = curl_exec($ch);

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($result); // loads your html
$xpath = new DOMXPath($doc);
$needle = 'blog';

$nodelist = $xpath->query("//a[contains(@href, '" . $needle . "')]");
for($i=0; $i < $nodelist->length; $i++) {
    $node = $nodelist->item($i);
    echo $node->getAttribute('href') . "\n";
}


来源:https://stackoverflow.com/questions/19614589/regex-to-print-url-from-any-webpage-with-specific-word-in-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!