Error tolerant XML reader

自闭症网瘾萝莉.ら 提交于 2019-11-29 08:05:37

Look around HTML Parser, 'cause html is almost xml

It's precisely because the real world is imperfect that XML is so widely used. What would be the functional specification for an error-tolerant XML parser? It's an open-ended problem. It's hard enough to parse all variations of well-formed XML without trying to second-guess all possible errors.

[... Waits for downvote.]

Run the XML through Beautiful Soup first. That will clean your XML of errors so it parses correctly

For the specific case of an RSS feed and the specific case of individual corrupt item entries, you can use XmlTextReader to manually read in each item separately, handling the XmlException for invalid items. When an Exception occurs, you'll need to use a new Reader instance, as the original Reader is hosed. You'll still have to have valid <item> and </item> tags to identify each item, but you'll be able to recover from corrupt data within each item.

yes, I know it's old question, but recently I was looking for tolerant xml parser and found the following: XmlParser.

A Roslyn-inspired full-fidelity XML parser with no dependencies and a simple Visual Studio XML language service.

The parser produces a full-fidelity syntax tree, meaning every character of the source text is represented in the tree. The tree covers the entire source text. The parser has no dependencies and can easily be made portable.

You can add Nugets in your project. I tried this parser and it can read any XML files.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!