Unable to parse value containing special character? Using sax parser

风格不统一 提交于 2019-12-04 01:38:28

问题


I am new to parsing field. I'm trying to write a parser code but unable to get the value with respect to a particular tag that value contains ampersand(&). Please help me to get the solution.

My xml file looks like

<system>
<u_id>10145</u_id>
<serial_no>1800015</serial_no>
<branch_name>B & P Infotech Ltd.</branch_name>
</system>

and I have tried with this java code, but it's not giving me proper output.

main class

package com.satya.xmltest;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

public class SaxTest {

    public static void main(String[] args) {
        SAXParserFactory parserFactory = SAXParserFactory.newInstance();
        SaxtestHandler handler=new SaxtestHandler();
        try {
            SAXParser parser = parserFactory.newSAXParser();
            parser.parse("C:\\Users\\abc\\Desktop\\test.xml", handler);
        } catch (Exception e) {
        }
        SystemTo systemTo=handler.systemTo;
        System.out.println("Uid :"+systemTo.getUid());
        System.out.println("serial number :"+systemTo.getSerialNumber());
        System.out.println("name :"+systemTo.getName());
    }
}

Handler class

In this class the parsing is done and setting the data values to data container class.

package com.satya.xmltest;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SaxtestHandler extends DefaultHandler {
    String content = "";
    SystemTo systemTo=new SystemTo();

    @Override
    public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {

        switch (qName) {
            case "system":
                System.out.println("inside company");
                break;
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName)
        throws SAXException {
        switch (qName) {
            case "u_id":
                systemTo.setUid(content);
                break;
            case "serial_no":
                systemTo.setSerialNumber(content);
                break;
            case "branch_name":
                systemTo.setName(content);
                break;
        }
    }

    @Override
    public void characters(char[] ch, int start, int length)
        throws SAXException {
        content = String.copyValueOf(ch, start, length).trim();
    }
}

Data container class

package com.satya.xmltest;

public class SystemTo {

    private String uid;
    private String serialNumber;
    private String name;
    public String getUid() {
        return uid;
    }
    public void setUid(String uid) {
        this.uid = uid;
    }
    public String getSerialNumber() {
        return serialNumber;
    }
    public void setSerialNumber(String serialNumber) {
        this.serialNumber = serialNumber;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
}

My output is:

Uid: 10145
serial number: 1800015
name: null

But I need:

Uid: 10145
serial number: 1800015
name: B & P Infotech Ltd.

Thanks in advance.


回答1:


There are some characters in XML that must not appear in their literal form in an XML document, except when used as markup delimiters or within a comment, a processing instruction, or a CDATA section.
List of characters and their corresponding entity or the numeric reference to replace :

Original Character    XML entity replacement      XML numeric replacement

      "                     &quot;                       &#34;   
      <                     &lt;                         &#60;   
      >                     &gt;                         &#62;
      &                     &amp;                        &#38;
      '                     &apos;                       &#39;   

you must replace above character in XML before you parse it.

You may use CDATA Section for text that is not markup constitutes the character data of the document




回答2:


You can escape these chars like html does:

<branch_name>B &amp; P Infotech Ltd.</branch_name>

Or you have use of CDATA:

<branch_name><![CDATA[B & P Infotech Ltd.]]></branch_name>



回答3:


The problem is that the "&" is an escape character it self.

To fix this you need to replace the ampersand with a unicode equivalent, i.e: "&#038;"




回答4:


You must replace your special characters with the characters that are accepted for an XML file. In your case & should be replaced by &amp;

@Override
public void characters(char[] ch, int start, int length)
        throws SAXException {
    content = String.copyValueOf(ch, start, length).trim();
    content = content.replace("&", "&amp;")
}


来源:https://stackoverflow.com/questions/21430207/unable-to-parse-value-containing-special-character-using-sax-parser

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!