Parsing specific data items from website

房东的猫 提交于 2019-12-10 11:48:59

问题


I tried to retrieve the following data variables from the this webpage

  • Address
  • City
  • State
  • Zip Code
  • Store Phone
  • Pharmacy Phone
  • Open Hours
  • Pharmacy Hours
  • Pickup Options
  • At this store/location
  • Site to Store Hours

I tried in this way, but i can't separate out some data to store in the above data variables so need some help and suggestion from some PHP expert

 $html = file_get_html('http://www.walmart.com/storeLocator/ca_storefinder_results.do?serviceName=&rx_title=com.wm.www.apps.storelocator.page.serviceLink.title.default&rx_dest=%2Findex.gsp&sfrecords=50&sfsearch_single_line_address=K6T');
foreach($html->find('div[class=StoreAddress] div[1]') as $name)
{
echo $name->innertext.'<br>';
}

The html of this website is complex to identify each data item with it's tag because their are no proper id assigned to tags. Can anyone please suggest easy and scalable way to parse above data items from this website.

Thanks


回答1:


The html isn't really that complex. Php's iterators and dom/regex functions are clumsy for tasks like this but it can be done:

$dom = new DOMDocument();
@$dom->loadHTMLFile('http://www.walmart.com/storeLocator/ca_storefinder_details_short.do?rx_dest=/index.gsp&rx_title=com.wm.www.apps.storelocator.page.serviceLink.title.default&edit_object_id=2092&sfsearch_single_line_address=K6T');
$xpath = new DOMXPath($dom);

foreach($xpath->query('//div[@class="StoreAddress"]') as $div) {
  // title
  echo $xpath->query(".//div[1]", $div)->item(0)->nodeValue . "\n";
  // street
  echo $xpath->query(".//div[2]", $div)->item(0)->nodeValue . "\n";
  // city state and zip
  preg_match('/(.*), ([A-Z]{2}) (\d{5})/', $xpath->query(".//div[3]", $div)->item(0)->nodeValue, $m);
  // city
  echo $m[1] . "\n";
  // state
  echo $m[2] . "\n";
  // zip
  echo $m[3] . "\n";
}



回答2:


i see that they implement a nice hr tag before the adress. explode it on the hr tag and use the remaining partwith the adress to rebuild the html object. then iterate through the divs and use preg_match to see if the object contains any reference to your wanted data.

foreach($html->find(’div’) as $test)
    {
     if(preg_match(’/Adress/’,$test->innertext))
        {
        filter out addy
        }
    }



回答3:


try out simple_html_dom library. On the page there are straight-forward examples that will get you up to speed.

I have been using that successfully for exactly the kind of things you are trying to do.



来源:https://stackoverflow.com/questions/10762051/parsing-specific-data-items-from-website

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!