This is meant to provide a canonical Q&A to all that similar (but much too specific questions to be a close target candidate) popping up once or twice a week.
Just came across the same problem. I almost wrote a recursive funtion to check for every tbody tag if it exists and traverse the dom that way, then I remembered I know regex. :)
Before parsing, get the html as a string. Insert missing and
tags with regex, then load it back into your DOMDocument object.
Jens Erat gives a good explanation, but here is
tags with regex
JavaScript
var html = 'foo bar
';
html.replace(/(]+)?>([^<>]+)?)(?!]+)?>)/g,"$1").replace(/(<(?!(\/tbody))([^>]+)?>)(<\/table([^>]+)?>)/g,"$1$4");
PHP
$html = $dom->saveHTML();
$html = preg_replace(array('/(]+)?>([^<>]+)?)(?!]+)?>)/','/(<(?!(\/tbody))([^>]+)?>)(<\/table([^>]+)?>)/'),array('$1','$1$4'),$html);
$dom->loadHTML($html);
Just the regex:
matches `` tag with whatever else junk inside the tag and between this and the next tag if the next tag is NOT `` also with stuff inside the tag
/(]+)?>([^<>]+)?)(?!]+)?>)/
replace with
$1
the $1 referencing the captured `` tag with contents.
Do the same for the closing tag like this:
/(<(?!(\/tbody))([^>]+)?>)(<\/table([^>]+)?>)/
replace with
$1$4
This way the dom will ALWAYS have the tags where necessary.
- 热议问题