Is htmlentities() sufficient for creating xml-safe values?

后端 未结 5 996
礼貌的吻别
礼貌的吻别 2020-11-29 03:40

I\'m building an XML file from scratch and need to know if htmlentities() converts every character that could potentially break an XML file (and possibly UTF-8 data)?

5条回答
  •  抹茶落季
    2020-11-29 04:20

    The Gordon's answer is good and explain the XML encode problems, but not show a simple function (or what the blackbox do). Jon's answer starting well with the 'htmlspecialchars' function recomendation, but he and others do some mistake, then I will be emphatic.

    A good programmer MUST have control about use or not of UTF-8 in your strings and XML data: UTF-8 (or another non-ASCII encode) IS SAFE in a consistent algorithm.

    SAFE UTF-8 XML NOT NEED FULL-ENTITY ENCODE. The indiscriminate encode produce "second class, non-human-readble, encode/decode-demand, XML". And safe ASCII XML, also not need entity encode, when all your content are ASCII.

    Only 3 or 4 characters need to be escaped in a string of XML content: >, <, &, and optional ". Please read http://www.w3.org/TR/REC-xml/ "2.4 Character Data and Markup" and "4.6 Predefined Entities". THEN YOU can use 'htmlentities'

    For illustration, the following PHP function will make a XML completely safe:

    // it is a didactic illustration, USE htmlentities($S,flag)
    function xmlsafe($s,$intoQuotes=0) {
    if ($intoQuotes)
        return str_replace(array('&','>','<','"'), array('&','>','<','"'), $s);
        // SAME AS htmlspecialchars($s)
    else
        return str_replace(array('&','>','<'), array('&','>','<'), $s);
        // SAME AS htmlspecialchars($s,ENT_NOQUOTES)
    }
    
    // example of SAFE XML CONSTRUCTION
    function xmlTag( $element, $attribs, $contents = NULL) {
    $out = '<' . $element;
    foreach( $attribs as $name => $val )
       $out .= ' '.$name.'="'. xmlsafe( $val,1 ) .'"';
    if ( $contents==='' || is_null($contents) )
        $out .= '/>';
    else
        $out .= '>'.xmlsafe( $contents )."";
    return $out;
    }
    

    In a CDATA block you not need use this function... But, please, avoid the indiscriminate use of CDATA.

提交回复
热议问题