XMLEventReader generates two EvText events for single tag

放肆的年华 提交于 2019-12-11 02:23:09

问题


I spotted a weird behavior in scala xml event reader. For an xml like this:

  <page>
    <title>AT&amp;T Bell Labs</title>
    <ns>0</ns>
    <id>63739</id>
  </page>

It generates to EvText events for title since it contains the special xml encoding of &.

case EvText( text ) =>
{
  println(text)
}

As a result for the code above, I get the output

AT 
 T Bell Labs

instead of AT&amp;T Bell Labs.


回答1:


Entity reference events are represented by their own constructor, EvEntityRef (and in general you shouldn't count on consecutive characters being represented by a single EvText event, anyway, if I remember correctly).

Here's some ugly imperative code I wrote at some point in the past to handle both kinds of text events:

def readText(reader: Iterator[XMLEvent]): String = {
  val builder = new StringBuilder
  var current = reader.next
  while (
    current match {
      case EvText(text)       => builder.append(text); true
      case EvEntityRef("amp") => builder.append("&"); true
      case EvEntityRef("lt")  => builder.append("<"); true
      case EvEntityRef("gt")  => builder.append(">"); true
      case _ => false
    }
  ) current = reader.next 
  builder.toString
}

Note that this burns the first non-text event (I think? who knows—this is the kind of code you never want to have to read again), and is generally unpleasant, but it should give you some idea of how you could handle this kind of thing.




来源:https://stackoverflow.com/questions/16591404/xmleventreader-generates-two-evtext-events-for-single-tag

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!