Getting encoding type of a XML in java

前端 未结 3 1279
北恋
北恋 2021-01-12 04:26

I am parsing XML using DocumentBuilder in java 1.4.
XML has first line as

xml version=\"1.0\" encoding=\"GBK\"

I want to

3条回答
  •  無奈伤痛
    2021-01-12 04:32

    This one works for various encodings, taking into concern both the BOM and the XML declaration. Defaults to UTF-8 if neither applies:

    String encoding;
    FileReader reader = null;
    XMLStreamReader xmlStreamReader = null;
    try {
        InputSource is = new InputSource(file.toURI().toASCIIString());
        XMLInputSource xis = new XMLInputSource(is.getPublicId(), is.getSystemId(), null);
        xis.setByteStream(is.getByteStream());
        PropertyManager pm = new PropertyManager(PropertyManager.CONTEXT_READER);
        for (Field field : PropertyManager.class.getDeclaredFields()) {
            if (field.getName().equals("supportedProps")) {
                field.setAccessible(true);
                ((HashMap) field.get(pm)).put(
                        Constants.XERCES_PROPERTY_PREFIX + Constants.ERROR_REPORTER_PROPERTY,
                        new XMLErrorReporter());
                break;
            }
        }
        encoding = new XMLEntityManager(pm).setupCurrentEntity("[xml]".intern(), xis, false, true);
        if (encoding != "UTF-8") {
            return encoding;
        }
    
        // From @matthias-heinrich’s answer:
        reader = new FileReader(file);
        xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader(reader);
        encoding = xmlStreamReader.getCharacterEncodingScheme();
    
        if (encoding == null) {
            encoding = "UTF-8";
        }
    } catch (RuntimeException e) {
        throw e;
    } catch (Exception e) {
        throw new UndeclaredThrowableException(e);
    } finally {
        if (xmlStreamReader != null) {
            try {
                xmlStreamReader.close();
            } catch (XMLStreamException e) {
            }
        }
        if (reader != null) {
            try {
                reader.close();
            } catch (IOException e) {
            }
        }
    }
    return encoding;
    

    Tested on Java 6 with:

    • UTF-8 XML file with BOM, with XML declaration ✓
    • UTF-8 XML file without BOM, with XML declaration ✓
    • UTF-8 XML file with BOM, without XML declaration ✓
    • UTF-8 XML file without BOM, without XML declaration ✓
    • ISO-8859-1 XML file (no BOM), with XML declaration ✓
    • UTF-16LE XML file with BOM, without XML declaration ✓
    • UTF-16BE XML file with BOM, without XML declaration ✓

    Standing on the shoulders of these giants:

    import java.io.*;
    import java.lang.reflect.*;
    import java.util.*;
    import javax.xml.stream.*;
    import org.xml.sax.*;
    import com.sun.org.apache.xerces.internal.impl.*;
    import com.sun.org.apache.xerces.internal.xni.parser.*;
    

提交回复
热议问题