Error Tolerant HTML/XML/SGML parsing in PHP

后端 未结 6 791
难免孤独
难免孤独 2020-12-06 13:48

I have a bunch of legacy documents that are HTML-like. As in, they look like HTML, but have additional made up tags that aren\'t a part of HTML



        
6条回答
  •  刺人心
    刺人心 (楼主)
    2020-12-06 14:36

    I wonder if passing the "bad" HTML through HTML Tidy might help as a first pass? Might be worth a look, if you can get the document to be well formed, maybe you could load it as a regular XML file with DomDocument.

提交回复
热议问题