I been trying to extract site table text along with its link from the given table to (which is in site1.com) to my php page using a web crawler.
But unfortunately,
Instead of writing your own parser solution you could use an existing one like Symfony's DomCrawler component: http://symfony.com/doc/current/components/dom_crawler.html
$crawler = new Crawler($returned_content);
$linkTexts = $crawler->filterXPath('//a')->each(function (Crawler $node, $i) {
return $node->text();
});
Or if you want to traverse the DOM tree yourself you can use DOMDocument's loadHTML
http://php.net/manual/en/domdocument.loadhtml.php
$document = new DOMDocument();
$document->loadHTML($returned_content);
foreach ($document->getElementsByTagName('a') as $link) {
$text = $link->nodeValue;
}
EDIT:
To get the links you want, the code assumes you have a $returned_content variable with the HTML you want to parse.
// creating a new instance of DOMDocument (DOM = Document Object Model)
$domDocument = new DOMDocument();
// save previous libxml error reporting and set error reporting to internal
// to be able to parse not well formed HTML doc
$previousErrorReporting = libxml_use_internal_errors(true);
$domDocument->loadHTML($returned_content);
libxml_use_internal_errors($previousErrorReporting);
$links = [];
/** @var DOMElement $node */
// getting all element from the HTML
foreach ($domDocument->getElementsByTagName('a') as $node) {
$parentNode = $node->parentNode;
// checking if the is under a that has class="FootNotes2"
$isChildOfAFootNotesTd = $parentNode->nodeName === 'td' && $parentNode->getAttribute('class') === 'FootNotes2';
// checking if the has class="Links2"
$isLinkOfLink2Class = $node->getAttribute('class') == 'Links2';
// as I assumed you wanted links from the this check makes sure that both of the above conditions are fulfilled
if ($isChildOfAFootNotesTd && $isLinkOfLink2Class) {
$links[] = [
'href' => $node->getAttribute('href'),
'text' => $parentNode->textContent,
];
}
}
print_r($links);
This will create you an array similar to:
Array
(
[0] => Array
(
[href] => /files/forum/2017/1/837242.php
[text] => Q@Q Drill Time ① - cardio69
)
[1] => Array
(
[href] => /files/forum/2017/1/837356.php
[text] => study partner in Houston - lacy
)
[2] => Array
(
[href] => /files/forum/2017/1/837110.php
[text] => Serious dedicated study partner for U World - step12013
)
...
- 热议问题