Goodbye World
I am using DOMDocument to manipulate / modify HTML before it gets output to the page. This is only a html fragment, not a complete page. My initial problem was that all fren
As others have pointed out, DOMDocument
and LoadHTML
will default to LATIN1 encoding with HTML fragments. It will also wrap your HTML with something like this:
YOUR HTML
So also as others have pointed out, you can fix the encoding by inserting a HEAD element into your HTML with a META element that contains the correct encoding.
However, if you're working with an HTML fragment, you probably don't want the wrapping to happen and you also don't want to keep that HEAD element you inserted.
The following code will insert the HEAD element, and then after processing, using regex will remove all the wrapping elements:
Hello World
Goodbye World
';
$head = '';
libxml_use_internal_errors(true);
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($head . $html);
$xpath = new DOMXPath($dom);
// Loop through all article.grid-item elements and add the "invisible" class to them
$nodes = $xpath->query("//article[contains(concat(' ', normalize-space(@class), ' '), ' grid-item ')]");
foreach($nodes as $node) {
$class = $node->getAttribute('class');
$class .= ' invisible';
$node->setAttribute('class', $class);
}
$content = preg_replace('/<\/?(!doctype|html|head|meta|body)[^>]*>/im', '', $dom->saveHTML());
libxml_use_internal_errors(false);
echo $content;
?>