How to get all urls from page (php)

前端 未结 3 1351
逝去的感伤
逝去的感伤 2020-12-29 17:30

I have a page with urls with descriptions listed one under another (something like bookmarks/list of sites). How do I use php to get all urls from that page and write them t

相关标签:
3条回答
  • 2020-12-29 17:51

    Another way

    $url = "http://wwww.somewhere.com";
    
    $html = file_get_contents($url);
    
    $doc = new DOMDocument();
    $doc->loadHTML($html); //helps if html is well formed and has proper use of html entities!
    
    $xpath = new DOMXpath($doc);
    
    $nodes = $xpath->query('//a');
    
    foreach($nodes as $node) {
        var_dump($node->getAttribute('href'));
    }
    
    0 讨论(0)
  • 2020-12-29 18:02

    one way

    $url="http://wwww.somewhere.com";
    $data=file_get_contents($url);
    $data = strip_tags($data,"<a>");
    $d = preg_split("/<\/a>/",$data);
    foreach ( $d as $k=>$u ){
        if( strpos($u, "<a href=") !== FALSE ){
            $u = preg_replace("/.*<a\s+href=\"/sm","",$u);
            $u = preg_replace("/\".*/","",$u);
            print $u."\n";
        }
    }
    
    0 讨论(0)
  • 2020-12-29 18:07

    You can use this to get all the link in the given web page.

    <?php
    
        $var = fread_url($url);
    
        preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+".
                        "(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/", 
                        $var, &$matches);
    
        $matches = $matches[1];
        $list = array();
    
        foreach($matches as $var)
        {    
            print($var."<br>");
        }
    
        function fread_url($url,$ref="")
        {
            if(function_exists("curl_init")){
                $ch = curl_init();
                $user_agent = "Mozilla/4.0 (compatible; MSIE 5.01; ".
                              "Windows NT 5.0)";
                $ch = curl_init();
                curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
                curl_setopt( $ch, CURLOPT_HTTPGET, 1 );
                curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
                curl_setopt( $ch, CURLOPT_FOLLOWLOCATION , 1 );
                curl_setopt( $ch, CURLOPT_FOLLOWLOCATION , 1 );
                curl_setopt( $ch, CURLOPT_URL, $url );
                curl_setopt( $ch, CURLOPT_REFERER, $ref );
                curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
                $html = curl_exec($ch);
                curl_close($ch);
            }
           else{
                $hfile = fopen($url,"r");
                if($hfile){
                    while(!feof($hfile)){
                        $html.=fgets($hfile,1024);
                    }
                }
            }
            return $html;
        }
    
        ?>
    
    0 讨论(0)
提交回复
热议问题