How to load XML when PHP can't indicate the right encoding?

前端 未结 4 1605
梦谈多话
梦谈多话 2020-12-12 00:08

I\'m trying to load an XML source from a remote location, so i have no control of the formatting. Unfortunately the XML file I\'m trying to load has no encoding:

<         


        
相关标签:
4条回答
  • 2020-12-12 00:43

    You can try using the XMLReader class instead. The XMLReader is designed specifically for XML and has options for what encoding to use (including 'null' for none).

    0 讨论(0)
  • 2020-12-12 00:52

    You've to convert your document into UTF-8, the easiest would be to use utf8_encode().

    DOMdocument example:

    $doc = new DOMDocument();
    $content = utf8_encode(file_get_contents($url));
    $doc->loadXML($content);
    

    SimpleXML example:

    $xmlInput = simplexml_load_string(utf8_encode(file_get_contents($url_or_file)));
    

    If you don't know the current encoding, use mb_detect_encoding(), for example:

    $content = utf8_encode(file_get_contents($url_or_file));
    $encoding = mb_detect_encoding($content);
    $doc = new DOMdocument();
    $res = $doc->loadXML("<?xml encoding='$encoding'>" . $content);
    

    Notes:

    • If encoding cannot be detected (function will return FALSE), you may try to force the encoding via utf8_encode().
    • If you're loading html code via $doc->loadHTML instead, you can still use XML header.

    If you know the encoding, use iconv() to convert it:

    $xml = iconv('ISO-8859-1' ,'UTF-8', $xmlInput)
    
    0 讨论(0)
  • 2020-12-12 00:57

    You could edit the document ('pre-process it') to specify the encoding it is being delivered in adding an XML declaration. What that is, you'll have to ascertain yourself, of course. The DOM object should then parse it.

    Example XML declaration:

    <?xml version="1.0" encoding="UTF-8" ?>
    
    0 讨论(0)
  • 2020-12-12 01:09

    I ran in to a similar situation. I was getting an XML file that was supposed to be UTF-8 encoded, but it included some bad ISO characters.

    I wrote the following code to encode the bad characters to UTF-8

    <?php
    
    # The XML file with bad characters
    $filename = "sample_xml_file.xml";
    
    # Read file contents to a variable
    $contents = file_get_contents($filename);
    
    # Find the bad characters
    preg_match_all('/[^(\x20-\x7F)]*/', $contents, $badchars);
    
    # Process bad characters if some were found
    if(isset($badchars[0]))
    {
            # Narrow down the results to uniques only
            $badchars[0] = array_unique($badchars[0]);
    
            # Replace the bad characters with their UTF8 equivalents
            foreach($badchars[0] as $badchar)
            {
                    $contents = preg_replace("/".$badchar."/", utf8_encode($badchar), $contents);
            }
    }
    
    # Write the fixed contents back to the file
    file_put_contents($filename, $contents);
    
    # Cleanup
    unset($contents);
    
    # Now the bad characters have been encoded to UTF8
    # It will now load file with DOMDocument
    $dom = new DOMDocument();
    $dom->load($filename);
    
    ?>
    

    I posted about the solution in more detail at: http://dev.strategystar.net/2012/01/convert-bad-characters-to-utf-8-in-an-xml-file-with-php/

    0 讨论(0)
提交回复
热议问题