Parsing XML file containing HTML entities in Java without changing the XML

前端 未结 6 1288
一个人的身影
一个人的身影 2020-12-05 18:53

I have to parse a bunch of XML files in Java that sometimes -- and invalidly -- contain HTML entities such as , > and so forth. I

6条回答
  •  难免孤独
    2020-12-05 19:28

    I would use a library like Jsoup for this purpose. I tested the following below and it works. I don't know if this helps. It can be located here: http://jsoup.org/download

    public static void main(String args[]){
    
    
        String html = "" + 
                      "Some text — invalid!";
        Document doc = Jsoup.parse(html, "", Parser.xmlParser());
    
        for (Element e : doc.select("bar")) {
            System.out.println(e);
        }   
    
    
    }
    

    Result:

    
     Some text — invalid!
    
    

    Loading from a file can be found here:

    http://jsoup.org/cookbook/input/load-document-from-file

提交回复
热议问题