Parse html table using file_get_contents to php array

后端 未结 2 707
再見小時候
再見小時候 2020-12-05 03:28

I am trying to parse the table shown here into a multi-dimensional php array. I am using the following code but for some reason its returning an empty array. After searching

相关标签:
2条回答
  • 2020-12-05 04:10

    I tried simple_html_dom but on larger files and on repeat calls to the function I am getting zend_mm_heap_corrupted on php 5.3 (GAH). I have also tried preg_match_all (but this has been failing on a larger file (5000) lines of html, which was only about 400 rows of my HTML table.

    I am using this and its working fast and not spitting errors.

    $dom = new DOMDocument();  
    
    //load the html  
    $html = $dom->loadHTMLFile("htmltable.html");  
    
      //discard white space   
    $dom->preserveWhiteSpace = false;   
    
      //the table by its tag name  
    $tables = $dom->getElementsByTagName('table');   
    
    
        //get all rows from the table  
    $rows = $tables->item(0)->getElementsByTagName('tr');   
      // get each column by tag name  
    $cols = $rows->item(0)->getElementsByTagName('th');   
    $row_headers = NULL;
    foreach ($cols as $node) {
        //print $node->nodeValue."\n";   
        $row_headers[] = $node->nodeValue;
    }   
    
    $table = array();
      //get all rows from the table  
    $rows = $tables->item(0)->getElementsByTagName('tr');   
    foreach ($rows as $row)   
    {   
       // get each column by tag name  
        $cols = $row->getElementsByTagName('td');   
        $row = array();
        $i=0;
        foreach ($cols as $node) {
            # code...
            //print $node->nodeValue."\n";   
            if($row_headers==NULL)
                $row[] = $node->nodeValue;
            else
                $row[$row_headers[$i]] = $node->nodeValue;
            $i++;
        }   
        $table[] = $row;
    }   
    
    var_dump($table);
    

    This code worked well for me. Example of original code is here.

    http://techgossipz.blogspot.co.nz/2010/02/how-to-parse-html-using-dom-with-php.html

    0 讨论(0)
  • 2020-12-05 04:18

    Don't cripple yourself parsing HTML with regexps! Instead, let an HTML parser library worry about the structure of the markup for you.

    I suggest you to check out Simple HTML DOM (http://simplehtmldom.sourceforge.net/). It is a library specifically written to aid in solving this kind of web scraping problems in PHP. By using such a library, you can write your scraping in much less lines of codes without worrying about creating working regexps.

    In principle, with Simple HTML DOM you just write something like:

    $html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
    foreach($html->find('tr') as $row) {
       // Parse table row here
    }
    

    This can be then extended to capture your data in some format, for instance to create an array of artists and corresponding titles as:

    <?php
    require('simple_html_dom.php');
    
    $table = array();
    
    $html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
    foreach($html->find('tr') as $row) {
        $time = $row->find('td',0)->plaintext;
        $artist = $row->find('td',1)->plaintext;
        $title = $row->find('td',2)->plaintext;
    
        $table[$artist][$title] = true;
    }
    
    echo '<pre>';
    print_r($table);
    echo '</pre>';
    
    ?>
    

    We can see that this code can be (trivially) changed to reformat the data in any other way as well.

    0 讨论(0)
提交回复
热议问题