Simplexml_load_string() fail to parse error

后端 未结 5 1978
眼角桃花
眼角桃花 2020-12-03 08:53

I\'m trying to load parse a Google Weather API response (Chinese response).

Here is the API call.

// This code fails with the following error
$xml =          


        
相关标签:
5条回答
  • 2020-12-03 09:19

    Just came accross this. This seems to work (the function itself I found on the web, just updated it a bit).:

    header('Content-Type: text/html; charset=utf-8'); 
    
    
    function getWeather() {
    
    $requestAddress = "http://www.google.com/ig/api?weather=11791&hl=zh-CN";
    // Downloads weather data based on location.
    $xml_str = file_get_contents($requestAddress,0);
    $xml_str = preg_replace("/(<\/?)(\w+):([^>]*>)/", "$1$2$3", $xml_str); 
    
    $xml_str = iconv("GB18030", "utf-8", $xml_str);
    
    
    // Parses XML
    $xml = new SimplexmlElement($xml_str, TRUE);
    // Loops XML
    $count = 0;
    echo '<div id="weather">';
    
    foreach($xml->weather as $item) {
    
        foreach($item->forecast_conditions as $new) {
    
            echo "<div class=\"weatherIcon\">\n";
             echo "<img src='http://www.google.com/" .$new->icon['data'] . "'   alt='".$new->condition['data']."'/><br>\n";
            echo "<b>".$new->day_of_week['data']."</b><br>";
            echo "Low: ".$new->low['data']." &nbsp;High: ".$new->high['data']."<br>";
            echo "\n</div>\n";
            }
    
    }
    
    echo '</div>';
    }
    
    
    getWeather();
    
    0 讨论(0)
  • 2020-12-03 09:22

    Update: I can reproduce the problem. Also, Firefox is auto-sniffing the character set as "chinese simplified" when I output the raw XML feed. Either the Google feed is serving incorrect data (Chinese Simplified characters instead of UTF-8 ones), or it is serving different data when not fetched in a browser - the content-type header in Firefox clearly says utf-8.

    Converting the incoming feed from Chinese Simplified (GB18030, this is what Firefox gave me) into UTF-8 works:

     $incoming = file_get_contents('http://www.google.com/ig/api?weather=11791&hl=zh-CN');
     $xml = iconv("GB18030", "utf-8", $incoming);
     $xml = simplexml_load_string($xml);
    

    it doesn't explain nor fix the underlying problem yet, though. I don't have time to take a deep look into this right now, maybe somebody else does. To me, it looks like Google are in fact serving incorrect data (which would surprise me. I didn't know they made mistakes like us mortals. :P)

    0 讨论(0)
  • 2020-12-03 09:23

    The problem here is that SimpleXML doesn't look at the HTTP header to determine the character encoding used in the document and simply assumes it's UTF-8 even though Google's server does advertise it as

    Content-Type: text/xml; charset=GB2312
    

    You can write a function that will take a look at that header using the super-secret magic variable $http_response_header and transform the response accordingly. Something like that:

    function sxe($url)
    {   
        $xml = file_get_contents($url);
        foreach ($http_response_header as $header)
        {   
            if (preg_match('#^Content-Type: text/xml; charset=(.*)#i', $header, $m))
            {   
                switch (strtolower($m[1]))
                {   
                    case 'utf-8':
                        // do nothing
                        break;
    
                    case 'iso-8859-1':
                        $xml = utf8_encode($xml);
                        break;
    
                    default:
                        $xml = iconv($m[1], 'utf-8', $xml);
                }
                break;
            }
        }
    
        return simplexml_load_string($xml);
    }
    
    0 讨论(0)
  • 2020-12-03 09:35

    This is the script I have made in php to parse Google Weather API.

     <?php
    
    function sxe($url)
    {
    $xml = file_get_contents($url);
    foreach ($http_response_header as $header)
    {
    if (preg_match('#^Content-Type: text/xml; charset=(.*)#i', $header, $m))
    {
    switch (strtolower($m[1]))
    {
    
    case 'utf-8':
    // do nothing
    break;
    
    case 'iso-8859-1':
    $xml = utf8_encode($xml);
    break;
    
    default:
    $xml = iconv($m[1], 'utf-8', $xml);
    }
    break;
    }
    }
    return simplexml_load_string($xml);
    }
    
    
    $xml = simplexml_load_file('http://www.google.com/ig/api?weather=46360&h1=en-us');
    $information = $xml->xpath("/xml_api_reply/weather/forecast_information");
    $current = $xml->xpath("/xml_api_reply/weather/current_conditions");
    $forecast = $xml->xpath("/xml_api_reply/weather/forecast_conditions");
    
    
    print "<br><br><center><div style=\"border: 1px solid; background-color: #ffffdffffd; background-image: url('http://mc-pdfd-live.dyndns.org/images/clouds.bmp'); width: 450\">";
    
    
    print "<br><h3>";
    print $information[0]->city['data'] . "&nbsp;" . $information[0]->unit_system['data'] . "&nbsp;" .     $information[0]->postal_code['data'];
    print "</h3>";
    print "<div style=\"border: 1px solid; width: 320px\">";
    print "<table cellpadding=\"5px\"><tr><td><h4>";
    print "Now";
    print "<br><br>";
    print "<img src=http://www.google.com" . $current[0]->icon['data'] . ">&nbsp;";
    print "</h4></td><td><h4>";
    print "<br><br>";
    print "&nbsp;" . $current[0]->condition['data'] . "&nbsp;";
    print "&nbsp;" . $current[0]->temp_f['data'] . "&nbsp;°F";
    print "<br>";
    print "&nbsp;" . $current[0]->wind_condition['data'];
    print "<br>";
    print "&nbsp;" . $current[0]->humidity['data'];
    print "<h4></td></tr></table></div>";
    
    
    
    
    print "<table cellpadding=\"5px\"><tr><td>";
    
    
    print "<table cellpadding=\"5px\"><tr><td><h4>";
    print "Today";
    print "<br><br>";
    print "<img src=http://www.google.com" . $forecast[0]->icon['data'] . ">&nbsp;";
    print "</h4></td><td><h4>";
    print "<br><br>";
    print  $forecast[0]->condition['data'];
    print "<br>";
    print  "High&nbsp;" . $forecast[0]->high['data'] . "&nbsp;°F";
    print "<br>";
    print  "Low&nbsp;" . $forecast[0]->low['data'] . "&nbsp;°F";
    print "</h4></td></tr></table>";
    
    print "<table cellpadding=\"5px\"><tr><td><h4>";
    print  $forecast[2]->day_of_week['data'];
    print "<br><br>";
    print "<img src=http://www.google.com" . $forecast[2]->icon['data'] . ">&nbsp;";
    print "</h4></td><td><h4>";
    print "<br><br>";
    print  "&nbsp;" . $forecast[2]->condition['data'];
    print "<br>";
    print  "&nbsp;High&nbsp;" . $forecast[2]->high['data'] . "&nbsp;°F";
    print "<br>";
    print  "&nbsp;Low&nbsp;" . $forecast[2]->low['data'] . "&nbsp;°F";
    print "</h4></td></tr></table>";
    
    
    print "</td><td>";
    
    
    print "<table cellpadding=\"5px\"><tr><td><h4>";
    print  $forecast[1]->day_of_week['data'];
    print "<br><br>";
    print "<img src=http://www.google.com" . $forecast[1]->icon['data'] . ">&nbsp;";
    print "</h4></td><td><h4>";
    print "<br><br>";
    print  "&nbsp;" . $forecast[1]->condition['data'];
    print "<br>";
    print  "&nbsp;High&nbsp;" . $forecast[1]->high['data'] . "&nbsp;°F";
    print "<br>";
    print  "&nbsp;Low&nbsp;" . $forecast[1]->low['data'] . "&nbsp;°F";
    print "</h4></td></tr></table>";
    
    print "<table cellpadding=\"5px\"><tr><td><h4>";
    print  $forecast[3]->day_of_week['data'];
    print "<br><br>";
    print "<img src=http://www.google.com" . $forecast[3]->icon['data'] . ">&nbsp;";
    print "</h4></td><td><h4>";
    print "<br><br>";
    print  "&nbsp;" . $forecast[3]->condition['data'];
    print "<br>";
    print  "&nbsp;High&nbsp;" . $forecast[3]->high['data'] . "&nbsp;°F";
    print "<br>";
    print  "&nbsp;Low&nbsp;" . $forecast[3]->low['data'] . "&nbsp;°F";
    print "</h4></td></tr></table>";
    
    
    print "</td></tr></table>";
    
    
    print "</div></center>";
    
    
    ?>
    
    0 讨论(0)
  • 2020-12-03 09:38

    Try to add in the url query parameter eo = utf-8. In this case, the answer will be exclusively the UTF-8 encoding. It helped me.

    http://www.google.com/ig/api?weather=?????&degree=??????&oe=utf-8&hl=es
    
    0 讨论(0)
提交回复
热议问题