问题
I spotted a weird behavior in scala xml event reader. For an xml like this:
<page>
<title>AT&T Bell Labs</title>
<ns>0</ns>
<id>63739</id>
</page>
It generates to EvText events for title since it contains the special xml encoding of &.
case EvText( text ) =>
{
println(text)
}
As a result for the code above, I get the output
AT
T Bell Labs
instead of AT&T Bell Labs.
回答1:
Entity reference events are represented by their own constructor, EvEntityRef (and in general you shouldn't count on consecutive characters being represented by a single EvText event, anyway, if I remember correctly).
Here's some ugly imperative code I wrote at some point in the past to handle both kinds of text events:
def readText(reader: Iterator[XMLEvent]): String = {
val builder = new StringBuilder
var current = reader.next
while (
current match {
case EvText(text) => builder.append(text); true
case EvEntityRef("amp") => builder.append("&"); true
case EvEntityRef("lt") => builder.append("<"); true
case EvEntityRef("gt") => builder.append(">"); true
case _ => false
}
) current = reader.next
builder.toString
}
Note that this burns the first non-text event (I think? who knows—this is the kind of code you never want to have to read again), and is generally unpleasant, but it should give you some idea of how you could handle this kind of thing.
来源:https://stackoverflow.com/questions/16591404/xmleventreader-generates-two-evtext-events-for-single-tag