问题
I'm using Symfony, Goutte, and DOMCrawler to scrape a page. Unfortunately, this page has many old fashioned tables of data, and no IDs or classes or identifying factors. So I'm trying to find a table by parsing through the source code I get back from the request, but I can't seem to access any information
I think when I try to filter it, it only filters the first node, and that's not where my desired data is, so it returns nothing.
so I have a $crawler object. And I've tried to loop through the following to get what I want: 
$title = $crawler->filterXPath('//td[. = "Title"]/following-sibling::td[1]')->each(funtion (Crawler $node, $i) {
        return $node->text();
});
I'm not sure what Crawler $node, I just got it from the example on the web page. Perhaps if I can get this working, then it will loop through each node in the $crawler object and find what I'm actually looking for. 
Here's an example of the page:
<table> 
<tr>
    <td>Title</td>
    <td>The Harsh Face of Mother Nature</td>
   <td>The Harsh Face of Mother Nature</td>
</tr>
.
.
.
</table>
And this is just one table, there are many tables and a huge sloppy mess outside of this one. Any ideas?
(Note: earlier I was able to apply a filter to the $crawler object for some information I needed, then I serialize() the information, and has a string finally, which made sense. But I cannot get a string at all anymore, idk why.)
回答1:
The DomCrawler html() function doesnt dump the whole html as per the function description :
http://api.symfony.com/2.6/Symfony/Component/DomCrawler/Crawler.html#method_html
it returns only the first node which it did in your case.
You may be able to use http://php.net/manual/en/domdocument.savehtml.php as the DomCrawler is a set of SplObjectStorage .
$html = $crawler->getNode(0)->ownerDocument->saveHTML();
回答2:
If you view the source for the Crawler::html() you will see that it is performing the following:
$html = '';
foreach ($this->getNode(0)->childNodes as $child) {
    $html .= $child->ownerDocument->saveHTML($child);
}
return $html;
来源:https://stackoverflow.com/questions/29267492/domcrawler-not-dumping-data-properly-for-parsing