How to find and replace an attribute value in a XML

后端 未结 3 581
盖世英雄少女心
盖世英雄少女心 2021-01-02 09:01

I am building a \"XML scanner\" in Java that finds attribute values starting with \"!Here:\". The attribute value contains instructions to replace later. for example I hav

3条回答
  •  温柔的废话
    2021-01-02 09:20

    We have some alternatives to this in Java.

    • First, JAXP (it has been bundled with Java since version 1.4).

    Let's assume we need to change the attribute customer to false in this XML:

    
    
       john@email.com
       mary@email.com
    
    

    With JAXP (this implementation is based in @t-gounelle sample) we could do this:

    //Load the document
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
    factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
    Document input = factory.newDocumentBuilder().parse(resourcePath);
    //Select the node(s) with XPath
    XPath xpath = XPathFactory.newInstance().newXPath();
    NodeList nodes = (NodeList) xpath.evaluate(String.format("//*[contains(@%s, '%s')]", attribute, oldValue), input, XPathConstants.NODESET);
    // Updated the selected nodes (here, we use the Stream API, but we can use a for loop too)
    IntStream
        .range(0, nodes.getLength())
        .mapToObj(i -> (Element) nodes.item(i))
        .forEach(value -> value.setAttribute(attribute, newValue));
    // Get the result as a String
    TransformerFactory factory = TransformerFactory.newInstance();
    factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
    Transformer xformer = factory.newTransformer();
    xformer.setOutputProperty(OutputKeys.INDENT, "yes");
    Writer output = new StringWriter();
    xformer.transform(new DOMSource(input), new StreamResult(output));
    String result = output.toString();
    

    Note that in order to disable external entity processing (XXE) for the DocumentBuilderFactory class, we configure the XMLConstants.FEATURE_SECURE_PROCESSING feature. It’s a good practice to configure it when we parse untrusted XML files. Check this OWASP guide with additional information.

    • Another alternative is dom4j. It's an open-source framework for processing XML which is integrated with XPath and fully supports DOM, SAX, JAXP and the Java platform such as Java Collections.

    We need to add the following dependencies to our pom.xml to use it:

    
        org.dom4j
        dom4j
        2.1.1
    
    
        jaxen
        jaxen
        1.2.0
    
    

    The implementation is very similar to JAXP equivalent:

    // Load the document
    SAXReader xmlReader = new SAXReader();
    Document input = xmlReader.read(resourcePath);
    // Features to prevent XXE
    xmlReader.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
    xmlReader.setFeature("http://xml.org/sax/features/external-general-entities", false);
    xmlReader.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
    // Select the nodes
    String expr = String.format("//*[contains(@%s, '%s')]", attribute, oldValue);
    XPath xpath = DocumentHelper.createXPath(expr);
    List nodes = xpath.selectNodes(input);
    // Updated the selected nodes
    IntStream
        .range(0, nodes.getLength())
        .mapToObj(i -> (Element) nodes.get(i);)
        .forEach(value -> value.addAttribute(attribute, newValue));
    // We can get the representation as String in the same way as the previous JAXP snippet.
    

    Note that with this method despite the name, if an attribute already exists for the given name it will be replaced otherwise it will add it. We can found the javadoc here.

    • Another nice alternative is jOOX, this library inspires its API in jQuery.

    We need to add the following dependencies to our pom.xml to use jOOX.

    For use with Java 9+:

    
        org.jooq
        joox
        1.6.2
    
    

    For use with Java 6+:

    
        org.jooq
        joox-java-6
        1.6.2
    
    

    We can implement our attribute changer like this:

    // Load the document
    DocumentBuilder builder = JOOX.builder();
    Document input = builder.parse(resourcePath);
    Match $ = $(input);
    // Select the nodes
    $
        .find("to") // We can use and XPATH expresion too.
        .get() 
        .stream()
        .forEach(e -> e.setAttribute(attribute, newValue));
    // Get the String reprentation
    $.toString();
    

    As we can see in this sample, the syntaxis is less verbose than JAXP and dom4j samples.

    I compared the 3 implementations with JMH and I got the following results:

    | Benchmark                          Mode  Cnt  Score   Error  Units |
    |--------------------------------------------------------------------|
    | AttributeBenchMark.dom4jBenchmark  avgt    5  0.167 ± 0.050  ms/op |
    | AttributeBenchMark.jaxpBenchmark   avgt    5  0.185 ± 0.047  ms/op |
    | AttributeBenchMark.jooxBenchmark   avgt    5  0.307 ± 0.110  ms/op |
    

    I put the examples here if you need to take a look.

提交回复
热议问题