How to read escaped characters using SAX parser in Characters method?

做~自己de王妃 提交于 2019-12-10 11:56:52

问题


I'm parsing the following XML using parser:

<Person>
<Name>Test</Name>
<Phone>111-111-2222</OtherPhone>
<Address>lee h&amp;y</Address>
<Person>

The characters method of the sax parser is only reading the address data until 'lee h' as it does not consider '&' as a character. I need to get the complete text in the address element. Any ideas on how I should do it? This is my sax parser(here address is a flag which notifies that an address element is present in XML):

boolean address=false;

 public void startElement(String uri, String localName,
            String qName, Attributes attributes)
            throws SAXException {


        if (qName.equalsIgnoreCase("Address")) {
            address= true;

        }

    public void characters(char ch[], int start, int length)
                throws SAXException {

            String data = new String(ch, start, length);


            if (address) {

                System.out.println("Address is: "+data);
                address = false;
            }

and the output is:: lee h


回答1:


The characters method is called three times here to report the content of the element Address because of the presence of an external entity. You should accumulate the content of the calls to characters until you receive an endElement event and then you have the complete content.

Please note the documentation of the characters method.

You could also benefit from the use of the ignorableWhitespace method with a validating parser and the appropriate schema (e.g. DTD) to let the parser know which spaces are ignorable (due to indentation).

In Java, it could be:

class MyHandler extends DefaultHandler {

    private StringBuilder acc;

    public MyHandler() {
        acc = new StringBuilder();
    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        System.out.printf("Characters accumulated: %s\n", acc.toString());
        acc.setLength(0);
    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        acc.append(ch, start, length);
    }
}



回答2:


The answer depends to some extent which parser you're using.

Here's a thorough rundown on the issue: http://www.ibm.com/developerworks/xml/library/x-tipsaxdo4/index.html

With a StaX parser you can specify the property isCoalescing=true. This property specifies whether to coalesce adjacent adjacent character data.

But with SAX there is no such control, generally.



来源:https://stackoverflow.com/questions/7798411/how-to-read-escaped-characters-using-sax-parser-in-characters-method

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!