Read escaped quote as escaped quote from xml

爷,独闯天下 提交于 2019-12-04 19:27:12

I've take a look on source code of apache xerces and propose my solution (but it is monkey patch). I've wrote simple class

package a;
import java.io.IOException;
import org.apache.xerces.impl.XMLDocumentScannerImpl;
import org.apache.xerces.parsers.NonValidatingConfiguration;
import org.apache.xerces.xni.XMLString;
import org.apache.xerces.xni.XNIException;
import org.apache.xerces.xni.parser.XMLComponent;

public class MyConfig extends NonValidatingConfiguration {

    private MyScanner myScanner;

    @Override
    @SuppressWarnings("unchecked")
    protected void configurePipeline() {
        if (myScanner == null) {
            myScanner = new MyScanner();
            addComponent((XMLComponent) myScanner);
        }
        super.fProperties.put(DOCUMENT_SCANNER, myScanner);
        super.fScanner = myScanner;
        super.fScanner.setDocumentHandler(this.fDocumentHandler);
        super.fLastComponent = fScanner;
    }

    private static class MyScanner extends XMLDocumentScannerImpl {

        @Override
        protected void scanEntityReference() throws IOException, XNIException {
            // name
            String name = super.fEntityScanner.scanName();
            if (name == null) {
                reportFatalError("NameRequiredInReference", null);
                return;
            }

            super.fDocumentHandler.characters(new XMLString(("&" + name + ";")
                .toCharArray(), 0, name.length() + 2), null);

            // end
            if (!super.fEntityScanner.skipChar(';')) {
                reportFatalError("SemicolonRequiredInReference",
                        new Object[] { name });
            }
            fMarkupDepth--;
        }
    }

}

You need to add only next line to your main method before start parsing

System.setProperty(
            "org.apache.xerces.xni.parser.XMLParserConfiguration",
            "a.MyConfig");

And you will have expected result:

false

ff1 "
ff1 "
Kristopher Ives

Looks like you can get the TEXT_NODE child and use getNodeValue (assuming it's not NULL):

public static String getRawContent(Node n) {
  if (n == null) {
      return null;
  }

  Node n1 = getChild(n, Node.TEXT_NODE);

  if (n1 == null) {
      return null;
  }

  return n1.getNodeValue();
}

Grabbed that from: http://www.java2s.com/Code/Java/XML/Gettherawtextcontentofanodeornullifthereisnotext.htm

There is no way to do this for the internal entities. XML does not support this concept. Internal entities are just a different way to write the same PSVI content into the text, they are not distinctive.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!