How to ignore unclosed tags in XML or HTML?

北城余情 提交于 2019-12-02 07:11:13

问题


I'm writing a parser in Haskell for the site using the packages Text.XML and Text.XML.Cursor.

There are unclosed tags and get an error:

Main.hs: Error parsing XML file dat.html: 29:1-29:8: Expected end element for: Name {nameLocalName = "br", nameNamespace = Nothing, namePrefix = Nothing}, but received: EventEndElement (Name {nameLocalName = "body", nameNamespace = Nothing, namePrefix = Nothing})

What to do? How to ignore such tags?


回答1:


A text object with unclosed tags is not well-formed and is therefore not XML.

So, forget about using any XML libraries, parsers, or tools. They are, by definition and design, not able to help you.

You have two options. Either,

  1. Repair the textual object to be well-formed by closing the unclosed tags. You might do this manually or try using TIDY, or
  2. Define a new data format that allows unclosed tags, and write a parser from the ground up for it.


来源:https://stackoverflow.com/questions/34577021/how-to-ignore-unclosed-tags-in-xml-or-html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!