Parsing html page that has two different format on the same elements

折月煮酒 提交于 2019-12-13 04:22:30

问题


In the same html pageThere're two different format of the same contain :

the first is :

<div class="gs"><h3 class="gsr"><a href="http://www.example1.com/">title1</a>

the second is :

<div class="gs"><h3 class="gsr"><span class="gsc"></span><a href="http://www.example2.com/">title2</a>

How to get links and titles in one code that can handle that two different format with simple_html_dom? I've tried this code, but it doesn't work :

foreach($html->find('h3[class=gsr]') as $docLink){
   $link = $docLink->first_child();
   echo $link->plaintext;
   echo $link->href;
}

回答1:


From the doc there seems to be a concept of Descendant Selectors

// Find all <td> in <table> which class=hello 
$es = $html->find('table.hello td');

Then

foreach($html->find('h3[class=gsr] a') as $link) {
   echo $link->plaintext;
   echo $link->href;
}

Should do your job. [I'm not really aware of simple_html_dom btw ;) Just a try]

EDIT

There is also nested selectors

// Find first <li> in first <ul> 
$e = $html->find('ul', 0)->find('li', 0);

So

foreach($html->find('h3[class=gsr]') as $docTitle) {
   $link = $docTitle->find('a', 0); //get the first anchor tag
   echo $link->plaintext;
   echo $link->href;
}

Should also work




回答2:


Use getElementsByTagName($tag);

It will locate all the specified tags inside the dom...

Refer this link getElementsByTagName



来源:https://stackoverflow.com/questions/11539505/parsing-html-page-that-has-two-different-format-on-the-same-elements

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!