XML CharacterDataHandler callback unpextedly called multiple times

て烟熏妆下的殇ゞ 提交于 2021-01-29 04:51:14

问题


I'm learning about libexpat. I cobbled together this example for basic familiarity using the API:

The Code:

#include <stdio.h>
#include <expat.h>
#include <string.h>
#include <iostream>

void start(void* userData, const char* name, const char* argv[])
{
  std::cout << "name: " << name << std::endl;

  int i = 0;

  while (argv[i])
  {
    std::cout << "argv[" << i << "] == " << argv[i++] << std::endl;
  }
}

void end(void* userData, const char* name)
{
}

void value(void* userData, const char* val, int len)
{
  char str[len+1];
  strncpy(str, val, len);
  str[len] = '\0';

  std::cout << "value: " << str << std::endl;
}

int main(int argc, char* argv[], char* envz[])
{
  XML_Parser parser = XML_ParserCreate(NULL);
  XML_SetElementHandler(parser, start, end);
  XML_SetCharacterDataHandler(parser, value);

  int bytesRead = 0;
  char val[1024] = {};
  FILE* fp = fopen("./catalog.xml", "r");
  std::cout << "fp == 0x" << (void*)fp << std::endl;

  do
  {
    bytesRead = fread(val, 1, sizeof(val), fp);
    std::cout << "In while loop bytesRead==" << bytesRead << std::endl;

    if (0 == XML_Parse(parser, val, bytesRead, (bytesRead < sizeof(val))))
    {
      break;
    }
  }
  while (1);

  XML_ParserFree(parser);
  std::cout << __FUNCTION__ << " end" << std::endl;

  return 0;
}

catalog.xml:

<CATALOG>
    <CD key1="value1" key2="value2">
        <TITLE>Empire Burlesque</TITLE>
        <ARTIST>Bob Dylan</ARTIST>
        <YEAR>1995</YEAR>
    </CD>
</CATALOG>

Makefile:

xml: xml.o
        g++ xml.o -lexpat -o xml

xml.o: main.cpp Makefile
        g++ -g -c main.cpp -o xml.o

Output:

fp == 0x0x22beb50
In while loop bytesRead==148
name: CATALOG
value: 

value:     
name: CD
argv[1] == key1
argv[2] == value1
argv[3] == key2
argv[4] == value2
value: 

value: 
name: TITLE
value: Empire Burlesque
value: 

value: 
name: ARTIST
value: Bob Dylan
value: 

value: 
name: YEAR
value: 1995
value: 

value:     
value: 

In while loop bytesRead==0
main end

Question:

From the output, it appears that the callback I installed with XML_SetCharacterDataHandler() gets called twice for the CATALOG,, CD, TITLE, and ARTIST xml tags, and then multiple times for the YEAR tag - can someone explain this behavior? From the noted catalog.xml, it's not clear to me why there are (or would ever be) multiple values associated with any XML tags.

Thank you.

Citation:

Credit to this site for the basis of the above sample code.


回答1:


The expat parser may split text nodes into multiple calls to the character data handler. To properly handle text nodes you must accumulate text over multiple calls and process it when receiving the "end" event for the containing tag.

This is true in general, even across different parsers and different languages -- i.e. the same thing is true in Java.

See for instance http://marcomaggi.github.io/docs/expat.html#using-comm

A common first–time mistake with any of the event–oriented interfaces to an XML parser is to expect all the text contained in an element to be reported by a single call to the character data handler. Expat, like many other XML parsers, reports such data as a sequence of calls; there's no way to know when the end of the sequence is reached until a different callback is made.

Also from the expat documentation

A single block of contiguous text free of markup may still result in a sequence of calls to this handler. In other words, if you're searching for a pattern in the text, it may be split across calls to this handler.



来源:https://stackoverflow.com/questions/42125772/xml-characterdatahandler-callback-unpextedly-called-multiple-times

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!