PHP - Processing Invalid XML

前端 未结 2 1219
长情又很酷
长情又很酷 2020-12-01 20:17

I\'m using SimpleXML to load in some xml files (which I didn\'t write/provide and can\'t really change the format of).

Occasionally (eg one or two files out of ever

2条回答
  •  心在旅途
    2020-12-01 20:42

    What you need is something that will use libxml's internal errors to locate invalid characters and escape them accordingly. Here's a mockup of how I'd write it. Take a look at the result of libxml_get_errors() for error info.

    function load_invalid_xml($xml)
    {
        $use_internal_errors = libxml_use_internal_errors(true);
        libxml_clear_errors(true);
    
        $sxe = simplexml_load_string($xml);
    
        if ($sxe)
        {
            return $sxe;
        }
    
        $fixed_xml = '';
        $last_pos  = 0;
    
        foreach (libxml_get_errors() as $error)
        {
            // $pos is the position of the faulty character,
            // you have to compute it yourself
            $pos = compute_position($error->line, $error->column);
            $fixed_xml .= substr($xml, $last_pos, $pos - $last_pos) . htmlspecialchars($xml[$pos]);
            $last_pos = $pos + 1;
        }
        $fixed_xml .= substr($xml, $last_pos);
    
        libxml_use_internal_errors($use_internal_errors);
    
        return simplexml_load_string($fixed_xml);
    }
    

提交回复
热议问题