Parse Website for URLs

后端 未结 3 1459
执念已碎
执念已碎 2020-12-07 02:49

Just wondering if someone can help me further with the following. I want to parse the URL on this website:http://www.directorycritic.com/free-directory-list.html?pg=1&so

3条回答
  •  刺人心
    刺人心 (楼主)
    2020-12-07 03:20

    Use HTML Dom Parser

    $html = file_get_html('http://www.example.com/');
    
    // Find all links
    $links = array(); 
    foreach($html->find('a') as $element) 
           $links[] = $element->href;
    

    Now links array contains all URLs of given page and you can use these URLs to parse further.

    Parsing HTML with regular expressions is not a good idea. Here are some related posts:

    • Using regular expressions to parse HTML: why not?
    • RegEx match open tags except XHTML self-contained tags

    EDIT:

    Some Other HTML Parsing tools as described by Gordon in comments below:

    • phpQuery
    • Zend_Dom
    • QueryPath
    • FluentDom

提交回复
热议问题