PHP DOM / XPath

后端 未结 2 1726
臣服心动
臣服心动 2020-12-20 06:32

Hopefully should be a simple question for someone that has done it before!

I have a list of old web documents in table format with lots of contact details in it. Wha

相关标签:
2条回答
  • 2020-12-20 06:48

    I was looking exactly for it, and worked perfect.

    I created a function to extract and save it to HTML

        function clean_web_source($web_source) {
            $dom = new DOMDocument();
            @$dom->loadHTML($web_source);
            $xpath = new DOMXPath($dom);
            $nodes = $xpath->query('//table[@width="580"]');
            $data = array();
            foreach ($nodes as $node) {
                $tmp_dom = new DOMDocument();
                $tmp_dom->appendChild($tmp_dom->importNode($node, true));
                $data[] = trim($tmp_dom->saveHTML()); //Before use "saveHTML" I used textContent and print_r($data) to identify the array position that interested me.
            }
            return $data[2]; //The code in position 2 it's what I want.
        }
    
        $url = "http://www.theurl.com/?param=1&lang=1";
    $web_source = file_get_contents($url);
    $target_source = clean_web_source($web_source); //What I've look for.
    

    Thanks.

    0 讨论(0)
  • 2020-12-20 06:55

    I believed you are looking for something like this:

    $nodes = $xpath->query('//table/tbody/tr/td[@align="top"] | 
                            //table/tbody/tr/td[@valign="top"]');
    
    $data = array();
    foreach ($nodes as $node) {
        $data[] = $node->textContent;
    }
    

    This would give you:

    Array
    (
        [0] => Indigo Blue 123
        [1] => 123 Blue House
        [2] => 
        [3] => 
        [4] => Hanley
        [5] => 
        [6] => ST13 4SN
        [7] => Stoke on Trent
        [8] => 01875 322511
        [9] => 
        [10] => www.indigoblue123.org.uk
    )
    
    0 讨论(0)
提交回复
热议问题