Using a XML schema to fix an XML in Java

落花浮王杯 提交于 2019-12-12 02:13:44

问题


Does anyone know of a tool that would allow me to take an XML string in Java, check it against a schema, and fix it if it is malformed?
For example, given the following schema and xml code

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">

  <xs:element name="tag">
   <xs:element name="subtag" type="xs:token" />
  </xs:element>
</xs:schema>


<tag>
<subtag>content
</tag>

I am looking for a tool that can read the schema, parse the XML, notice the missing tag, and add it. For purposes of this particular program, I don't need any correction other than missing tags. (btw, a tool that can locate and add missing tags without using the schema is fine also).
Any suggestions?


回答1:


The trouble is, of course, that for any instance that doesn't conform to the schema, there are an infinite number of "similar" instances that do conform to the schema, and your challenge is to choose the one that is "most similar" on some measure.

HTML5 tries to do this, with an elaborate set of rules. These rules contain a lot of knowledge of the specific schema, for example if a tr is found as a child of a table then the tr is wrapped in a tbody. You could try to do the same for your schema/vocabulary, but be prepared for a lot of work.

Doing the same thing for an arbitrary schema sounds like an interesting PhD project. Doing it successfully would probably require some research into the causes of deviations from the schema (just as spelling correction should take into account whether the input was typed by the user, obtained by voice recognition, or obtained using OCR scanning - each introduces different kinds of errors.)




回答2:


Try JTidy, it will fix up malformed XML as well as HTML.



来源:https://stackoverflow.com/questions/8968701/using-a-xml-schema-to-fix-an-xml-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!