Suggested method for dealing with invalid XML

最后都变了- 提交于 2019-12-10 22:46:40

问题


I'm trying to integrate a program with a 3rd party service using Delphi XE2. The problem I'm running into is that the service isn't escaping any of their values in the XML documents they send me.

This is one of their "sample" xml documents

<plans type="array">
  <plan>
    <id type="integer">1</id>
    <series-title>A New Plan</series-title>
    <dates>January 16 & 17, 2010</dates>
   <plan-title>A New Plan For Your Family</plan-title>
 </plan>
 ...
</plans>

My original plan was just to wrap all the data in CDATA tags, but that doesn't seem like an ideal solution.

I also thought about searching for the & character and replacing it with &amp; but it doesn't escape ANY of the user input, including < and > and doing a search and replace for every invalid xml character sounds like a bad idea as well.

Any suggestions on how I should go about dealing with the invalid xml documents?


回答1:


Start by refusing to refer to these documents as "XML" - they aren't XML.

Persuade your supplier that many people have adopted XML and are getting benefits from it, and it would be a good idea if they did so too.

If your supplier is under the impression that they are sending you XML, put them right. Being almost XML doesn't help. It's like sending you Java code that won't compile.




回答2:


Invalid XML documents are invalid. The correct way to deal with them is to display a clear error message, and have the people sending you the "XML" fix their bug.




回答3:


If getting truly valid XML from the service is out of the question, it may help to run the XML through something like Tidy and parse the resulting document:

tidy --xml-input true --xml-output true input.xml

It's also possible to give tidy a document via stdin and receive the result with stdout and stderr, if you don't want to write a file to disk just to run it through something like this.

There's always risk in doing this, though, as depending on the "malformities" in the source document, there's always a chance that the "cleaned" document may be missing attributes, elements, etc.



来源:https://stackoverflow.com/questions/12043065/suggested-method-for-dealing-with-invalid-xml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!