LibXML internal and output encodings

人走茶凉 提交于 2019-12-11 18:25:28

问题


I'm trying to write XML files with libxml2 in ISO-8859-1. But from the documentation it seems that for each text node that I create I'll have to convert to UTF-8 which is libxml's internal encoding. Then when calling xmlSaveFormatFileEnc() libxml converts to the target encoding and adds the encoding attribute to the document.

Is this assumption correct? For now my code goes roughly like this:

xmlNode *root_element = NULL, *node4 = NULL; xmlDoc *doc = NULL;

doc = xmlNewDoc(BAD_CAST XML_DEFAULT_VERSION);
root_element = xmlNewDocNode(doc, NULL, BAD_CAST("root"),
                    NULL);
char * input_str = getLatin1Data();
isolat1ToUTF8(utf8_str, &file_size, input_str, &inlen);

node4 = xmlNewCDataBlock(doc, BAD_CAST list_content, xmlStrlen(BAD_CAST utf8_str));

xmlAddChild(root_element, node4);
xmlSaveFormatFileEnc("test_file.xml", doc, "UTF-8", 1);
xmlFreeDoc(doc);


回答1:


Your assumption is right. When xmlChar is expected, like in xmlNewCDataBlock, xmlNewText, it is always UTF-8:

From include/libxml/xmlstring.h (libxml 2.8.0):

/**
 * xmlChar:
 *
 * This is a basic byte in an UTF-8 encoded string.
 * It's unsigned allowing to pinpoint case where char * are assigned
 * to xmlChar * (possibly making serialization back impossible).
 */


来源:https://stackoverflow.com/questions/3164815/libxml-internal-and-output-encodings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!