问题
I tried to retrieve the following data variables from the this webpage
- Address
- City
- State
- Zip Code
- Store Phone
- Pharmacy Phone
- Open Hours
- Pharmacy Hours
- Pickup Options
- At this store/location
- Site to Store Hours
I tried in this way, but i can't separate out some data to store in the above data variables so need some help and suggestion from some PHP expert
$html = file_get_html('http://www.walmart.com/storeLocator/ca_storefinder_results.do?serviceName=&rx_title=com.wm.www.apps.storelocator.page.serviceLink.title.default&rx_dest=%2Findex.gsp&sfrecords=50&sfsearch_single_line_address=K6T');
foreach($html->find('div[class=StoreAddress] div[1]') as $name)
{
echo $name->innertext.'<br>';
}
The html of this website is complex to identify each data item with it's tag because their are no proper id assigned to tags. Can anyone please suggest easy and scalable way to parse above data items from this website.
Thanks
回答1:
The html isn't really that complex. Php's iterators and dom/regex functions are clumsy for tasks like this but it can be done:
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://www.walmart.com/storeLocator/ca_storefinder_details_short.do?rx_dest=/index.gsp&rx_title=com.wm.www.apps.storelocator.page.serviceLink.title.default&edit_object_id=2092&sfsearch_single_line_address=K6T');
$xpath = new DOMXPath($dom);
foreach($xpath->query('//div[@class="StoreAddress"]') as $div) {
// title
echo $xpath->query(".//div[1]", $div)->item(0)->nodeValue . "\n";
// street
echo $xpath->query(".//div[2]", $div)->item(0)->nodeValue . "\n";
// city state and zip
preg_match('/(.*), ([A-Z]{2}) (\d{5})/', $xpath->query(".//div[3]", $div)->item(0)->nodeValue, $m);
// city
echo $m[1] . "\n";
// state
echo $m[2] . "\n";
// zip
echo $m[3] . "\n";
}
回答2:
i see that they implement a nice hr tag before the adress. explode it on the hr tag and use the remaining partwith the adress to rebuild the html object. then iterate through the divs and use preg_match to see if the object contains any reference to your wanted data.
foreach($html->find(’div’) as $test)
{
if(preg_match(’/Adress/’,$test->innertext))
{
filter out addy
}
}
回答3:
try out simple_html_dom library. On the page there are straight-forward examples that will get you up to speed.
I have been using that successfully for exactly the kind of things you are trying to do.
来源:https://stackoverflow.com/questions/10762051/parsing-specific-data-items-from-website