$string = file_get_contents(\'http://example.com\'); if (\'UTF-8\' === mb_detect_encoding($string)) { $dom = new DOMDocument(); // hack to preserve UTF-8 ch
I had similar problems recently, and eventually found this workaround - convert all the non-ascii characters to html entities before loading the html
$string = mb_convert_encoding($string, 'HTML-ENTITIES', "UTF-8"); $dom->loadHTML($string);