How can I clean HTML tags out of a ColdFusion string?

前端 未结 6 1817
南方客
南方客 2021-02-14 00:16

I am looking for a quick way to parse HTML tags out of a ColdFusion string. We are pulling in an RSS feed, that could potentially have anything in it. We are then doing some man

6条回答
  •  没有蜡笔的小新
    2021-02-14 00:36

    HTML is not a Regular language, so using Regular expressions on (uncontrolled) HTML is something that should be done with great care (if at all).

    Consider, for example, the following valid segment of HTML:

    a boat
    

    You'll note how the syntax highlighter is choking on that - as will the existing regex that has been offered.

    Unless you can be certain that the string you are processing will not contain HTML code similar to the above, you should avoid making assumptions/compromise, which a single/pure regex route would force you to do.

    (Note: The same problem applies to the suggested char-by-char method too.)


    To solve your problem, you should use a DOM parser to parse your string into a HTML object, looping through each element and converting to text.

    If you have valid XHTML then you can use CF's XmlParse() to produce the object which you can then loop though. If it might be non-XML HTML then there's no built-in option with CF8, so you'll have to investigate options in Java/etc.

提交回复
热议问题