Error Tolerant HTML/XML/SGML parsing in PHP

后端 未结 6 782
难免孤独
难免孤独 2020-12-06 13:48

I have a bunch of legacy documents that are HTML-like. As in, they look like HTML, but have additional made up tags that aren\'t a part of HTML



        
6条回答
  •  温柔的废话
    2020-12-06 14:29

    My quick and dirty solution to this problem was to run a loop that matches my list of custom tags with a regular expression. The regexp doesn't catch tags that have another inner custom tag inside them.

    When there is a match, a function to process that tag is called and returns the "processed HTML". If that custom tag was inside another custom tag than the parent becomes childless by the fact that actual HTML was inserted in place of the child, and it will be matched by the regexp and processed at the next iteration of the loop.

    The loop ends when there are no childless custom tags to be matched. Overall it's iterative (a while loop) and not recursive.

提交回复
热议问题