What is the correct way to detect whether string inputs contain HTML or not?

前端 未结 13 748
旧时难觅i
旧时难觅i 2020-12-23 15:08

When receiving user input on forms I want to detect whether fields like \"username\" or \"address\" does not contain markup that has a special meaning in XML (RSS feeds) or

13条回答
  •  不知归路
    2020-12-23 15:35

    In a comment above, you wrote:

    Just stop the browser from treating the string as markup.

    This is an entirely different problem to the one in the title. The approach in the title is usually wrong. Stripping out tags just mangles input and can lead to data loss. Ever tried to talk about HTML on a blog that strips tags? Frustrating.

    The solution that is usually the correct one is to do as you said in your comment - to stop the browser from treating the string as markup. This - literally taken - is not possible. What you do instead is encode the content as HTML.

    Consider the following data:

    Test
    

    Now, you can look at this one of two ways. You can look at it as literal data - a sequence of characters. You can look at it as HTML - markup that includes strongly emphasises text.

    If you just dump that out into an HTML document, you are treating it as HTML. You can't treat it as literal data in that context. What you need is HTML that will output the literal data. You need to encode it as HTML.

    Your problem is not that you have too much HTML - it's that you have too little. When you output <, you are outputting raw data in an HTML context. You need to convert it to <, which is the HTML representation of that data before outputting it.

    PHP offers a few different options for doing this. The most direct is to use htmlspecialchars() to convert it into HTML, and then nl2br() to convert the line breaks into
    elements.

提交回复
热议问题