PHP DOM / XPath

后端未结

关注

 2  1726

臣服心动

Hopefully should be a simple question for someone that has done it before!

I have a list of old web documents in table format with lots of contact details in it. Wha

相关标签:

2条回答

栀梦

2020-12-20 06:48

I was looking exactly for it, and worked perfect.

I created a function to extract and save it to HTML

    function clean_web_source($web_source) {
        $dom = new DOMDocument();
        @$dom->loadHTML($web_source);
        $xpath = new DOMXPath($dom);
        $nodes = $xpath->query('//table[@width="580"]');
        $data = array();
        foreach ($nodes as $node) {
            $tmp_dom = new DOMDocument();
            $tmp_dom->appendChild($tmp_dom->importNode($node, true));
            $data[] = trim($tmp_dom->saveHTML()); //Before use "saveHTML" I used textContent and print_r($data) to identify the array position that interested me.
        }
        return $data[2]; //The code in position 2 it's what I want.
    }

    $url = "http://www.theurl.com/?param=1&lang=1";
$web_source = file_get_contents($url);
$target_source = clean_web_source($web_source); //What I've look for.

Thanks.

0 讨论(0)

日久生厌

2020-12-20 06:55

I believed you are looking for something like this:

$nodes = $xpath->query('//table/tbody/tr/td[@align="top"] | 
                        //table/tbody/tr/td[@valign="top"]');

$data = array();
foreach ($nodes as $node) {
    $data[] = $node->textContent;
}

This would give you:

Array
(
    [0] => Indigo Blue 123
    [1] => 123 Blue House
    [2] => 
    [3] => 
    [4] => Hanley
    [5] => 
    [6] => ST13 4SN
    [7] => Stoke on Trent
    [8] => 01875 322511
    [9] => 
    [10] => www.indigoblue123.org.uk
)

0 讨论(0)