Generating XML document in PHP (escape characters)

前端 未结 10 975
难免孤独
难免孤独 2020-11-27 04:20

I\'m generating an XML document from a PHP script and I need to escape the XML special characters. I know the list of characters that should be escaped; but what is the corr

10条回答
  •  离开以前
    2020-11-27 05:01

    In order to have a valid final XML text, you need to escape all XML entities and have the text written in the same encoding as the XML document processing-instruction states it (the "encoding" in the line). The accented characters don't need to be escaped as long as they are encoded as the document.

    However, in many situations simply escaping the input with htmlspecialchars may lead to double-encoded entities (for example é would become é), so I suggest decoding html entities first:

    function xml_escape($s)
    {
        $s = html_entity_decode($s, ENT_QUOTES, 'UTF-8');
        $s = htmlspecialchars($s, ENT_QUOTES, 'UTF-8', false);
        return $s;
    }
    

    Now you need to make sure all accented characters are valid in the XML document encoding. I strongly encourage to always encode XML output in UTF-8, since not all the XML parsers respect the XML document processing-instruction encoding. If your input might come from a different charset, try using utf8_encode().

    There's a special case, which is your input may come from one of these encodings: ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R -- PHP treats them all the same, but there are some slight differences in them -- some of which even iconv() cannot handle. I could only solve this encoding issue by complementing utf8_encode() behavior:

    function encode_utf8($s)
    {
        $cp1252_map = array(
        "\xc2\x80" => "\xe2\x82\xac",
        "\xc2\x82" => "\xe2\x80\x9a",
        "\xc2\x83" => "\xc6\x92",
        "\xc2\x84" => "\xe2\x80\x9e",
        "\xc2\x85" => "\xe2\x80\xa6",
        "\xc2\x86" => "\xe2\x80\xa0",
        "\xc2\x87" => "\xe2\x80\xa1",
        "\xc2\x88" => "\xcb\x86",
        "\xc2\x89" => "\xe2\x80\xb0",
        "\xc2\x8a" => "\xc5\xa0",
        "\xc2\x8b" => "\xe2\x80\xb9",
        "\xc2\x8c" => "\xc5\x92",
        "\xc2\x8e" => "\xc5\xbd",
        "\xc2\x91" => "\xe2\x80\x98",
        "\xc2\x92" => "\xe2\x80\x99",
        "\xc2\x93" => "\xe2\x80\x9c",
        "\xc2\x94" => "\xe2\x80\x9d",
        "\xc2\x95" => "\xe2\x80\xa2",
        "\xc2\x96" => "\xe2\x80\x93",
        "\xc2\x97" => "\xe2\x80\x94",
        "\xc2\x98" => "\xcb\x9c",
        "\xc2\x99" => "\xe2\x84\xa2",
        "\xc2\x9a" => "\xc5\xa1",
        "\xc2\x9b" => "\xe2\x80\xba",
        "\xc2\x9c" => "\xc5\x93",
        "\xc2\x9e" => "\xc5\xbe",
        "\xc2\x9f" => "\xc5\xb8"
        );
        $s=strtr(utf8_encode($s), $cp1252_map);
        return $s;
    }
    

提交回复
热议问题