How to find if String contains html data?

后端 未结 7 2337
遇见更好的自我
遇见更好的自我 2020-12-15 18:13

How do I find if a string contains HTML data or not? The user provides input via web interface and it\'s quite possible he could have used either a simple text or used HTML

相关标签:
7条回答
  • 2020-12-15 18:58

    If you don't want the user to have HTML in their input, you can replace all '<' characters with their HTML entity equivalent, '& lt;' and all '>' with '& gt;' (no spaces between & and g)

    0 讨论(0)
  • 2020-12-15 19:05

    You have to get help only by the regular expression strings. They help you find out potential html tags. You can then compare the inner to contain any html keywords. If its found, put up an alert telling not to use HTML. Or simply delete it if you feel otherwise.

    0 讨论(0)
  • 2020-12-15 19:06

    Below will match any tags. You can also extract tag, attributes and value

        Pattern pattern = Pattern.compile("<(\\w+)( +.+)*>((.*))</\\1>");
        Matcher matcher = pattern.matcher("<as testAttr='5'> TEST</as>");
        if (matcher.find()) {
            for (int i = 0; i < matcher.groupCount(); i++) {
                System.out.println(i + ":" + matcher.group(i));
            }
        }
    
    0 讨论(0)
  • 2020-12-15 19:09

    You can use regular expressions to search for HTML tags.

    0 讨论(0)
  • 2020-12-15 19:10

    In your backing bean, you can try to find html tags such as <b> or <i>, etc... You can use regular expressions (slow) or just try to find the "<>" chars. It depends on how sure you want to be that the user used html or not.

    Keep in mind that the user could write <asdf>. If you want to be 100% sure that the html used is valid you will need to use a complex html parser from some library (TidyHTML maybe?)

    0 讨论(0)
  • 2020-12-15 19:11

    I know this is an old question but I ran into it and was looking for something more comprehensive that could detect things like HTML entities and would ignore other uses of < and > symbols. I came up with the following class that works well.

    You can play with it live at http://ideone.com/HakdHo

    I also uploaded this to GitHub with a bunch of JUnit tests.

    package org.github;
    
    /**
     * Detect HTML markup in a string
     * This will detect tags or entities
     *
     * @author dbennett455@gmail.com - David H. Bennett
     *
     */
    
    import java.util.regex.Pattern;
    
    public class DetectHtml
    {
        // adapted from post by Phil Haack and modified to match better
        public final static String tagStart=
            "\\<\\w+((\\s+\\w+(\\s*\\=\\s*(?:\".*?\"|'.*?'|[^'\"\\>\\s]+))?)+\\s*|\\s*)\\>";
        public final static String tagEnd=
            "\\</\\w+\\>";
        public final static String tagSelfClosing=
            "\\<\\w+((\\s+\\w+(\\s*\\=\\s*(?:\".*?\"|'.*?'|[^'\"\\>\\s]+))?)+\\s*|\\s*)/\\>";
        public final static String htmlEntity=
            "&[a-zA-Z][a-zA-Z0-9]+;";
        public final static Pattern htmlPattern=Pattern.compile(
          "("+tagStart+".*"+tagEnd+")|("+tagSelfClosing+")|("+htmlEntity+")",
          Pattern.DOTALL
        );
    
        /**
         * Will return true if s contains HTML markup tags or entities.
         *
         * @param s String to test
         * @return true if string contains HTML
         */
        public static boolean isHtml(String s) {
            boolean ret=false;
            if (s != null) {
                ret=htmlPattern.matcher(s).find();
            }
            return ret;
        }
    
    }
    
    0 讨论(0)
提交回复
热议问题