LibXML2 Sax Parsing and ampersand

爱⌒轻易说出口 提交于 2019-12-04 09:37:52

问题


I've encountered (what I think is) a strange behavior when using the sax parser, and I wanted to know if it's normal.

I'm sending this XML through the SAX parser:

<site url="http://example.com/?a=b&amp;b=c"; />

The "&" gets converted to " &" when the startElement callback is called. Is it supposed to do that? If so, I would like to understand why.

I've pasted an example demonstrating the issue here:

#include <stdlib.h>
#include <libxml/parser.h>

static void start_element(void * ctx, const xmlChar *name, const xmlChar **atts)
{
  int i = 0;
  while(atts[i] != NULL) {
    printf("%s\n", atts[i]);
    i++;
  }
}

int main(int argc, char *argv[]) {
  xmlSAXHandlerPtr handler = calloc(1, sizeof(xmlSAXHandler));
  handler->startElement = start_element;

  char * xml = "<site url=\"http://example.com/?a=b&amp;b=c\" />";

  xmlSAXUserParseMemory( handler,
                          NULL,
                          xml,
                          strlen(xml)
  );
}

PS: This message is actually extracted from the LibXML2 list... and I am not the initial author of this mail, but I noticed the problem using Nokogiri and Aaron (the maintainer of Nokogiri) actually posted this message himself.


回答1:


This message describes the same problem (which I had as well) and the response says to

ask the parser to replace entities values

What that means is when you are setting up your context, set the option like this:

xmlParserCtxtPtr context = xmlCreatePushParserCtxt(&yourSAXHandlerStruct, self, NULL, 0, NULL);
xmlCtxtUseOptions(context, XML_PARSE_NOENT);


来源:https://stackoverflow.com/questions/982716/libxml2-sax-parsing-and-ampersand

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!