Convert a (nested)HTML unordered list of links to PHP array of links

雨燕双飞 提交于 2019-12-25 03:44:31

问题


I have a regular, nested HTML unordered list of links, and I'd like to scrape it with PHP and convert it to an array.

The original list looks something like this:

<ul>
<li><a href="http://someurl.com">First item</a>
    <ul>
    <li><a href="http://someotherurl.com/">Child of First Item</a></li>
    <li><a href="http://someotherurl.com/">Second Child of First Item</a></li>
    </ul>
</li>
<li><a href="http://bogusurl.com">Second item</a></li>
<li><a href="http://bogusurl.com">Third item</a></li>
<li><a href="http://bogusurl.com">Fourth item</a></li>
</ul>

Any of the items can have children.

(The actual screen scraping is not a problem, I can do that.)

I'd like to turn this into a PHP array, of just the links, while keeping the hierarchical nature of the list. Any ideas?

I've looked at using htmlsimpledom and phpQuery, which both use jQuery like syntax. But, I can't seem to get the syntax right. I can get all the links, but I end up losing the hierarchical nature and order.

Thanks.


回答1:


Use DOMDocument and SimpleXMLElement along the lines of:

$doc = new DOMDocument();
$doc->loadHTML($html);
$xmlStr = $doc->saveXml($doc->documentElement);

$xml = new SimpleXmlElement($xmlStr);

$links = array();

foreach ($xml->xpath('//a') as $li) {
    $links[] = $li->attributes()->href;
}

If href is being added to $links as a SimpleXMLElement, use ob_start and ob_clean to capture the string.

Cheat sheet for xpath queries (pdf)



来源:https://stackoverflow.com/questions/2617487/convert-a-nestedhtml-unordered-list-of-links-to-php-array-of-links

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!