Detect HTML tags in a string

前端 未结 8 1459
渐次进展
渐次进展 2020-12-04 12:59

I need to detect whether a string contains HTML tags.

if(!preg_match(\'(?<=<)\\w+(?=[^<]*?>)\', $string)){ 
    return $string;
}
相关标签:
8条回答
  • 2020-12-04 13:31

    Parsing HTML in general is a hard problem, there is some good material here:

    • Parsing Html The Cthulhu Way
    • Parsing: Beyond Regex

    But regarding your question ('better' solution) - can be more specific regarding what you are trying to achieve, and what tools are available to you?

    0 讨论(0)
  • 2020-12-04 13:32

    I would use strlen() because if you don't, then a character-by-character comparison is done and that can be slow, though I would expect the comparison to quit as soon as it found a difference.

    0 讨论(0)
  • 2020-12-04 13:34

    If you just want to detect/replace certain tags: This function will search for certain html tags and encapsulate them in brackets - which is pretty senseless - just modify it to whatever you want to do with the tags.

    $html = preg_replace_callback(
        '|\</?([a-zA-Z]+[1-6]?)(\s[^>]*)?(\s?/)?\>|',
        function ($found) {
            if(isset($found[1]) && in_array(
                $found[1], 
                array('div','p','span','b','a','strong','center','br','h1','h2','h3','h4','h5','h6','hr'))
            ) {
                return '[' . $found[0] . ']';
            };
        },
        $html  
    );
    

    Explaination of the regex:

    \< ... \>   //start and ends with tag brackets
    \</?        //can start with a slash for closing tags
    ([a-zA-Z]+[1-6]?)    //the tag itself (for example "h1")
    (\s[^>]*)? //anything such as class=... style=... etc.
    (\s?/)?     //allow self-closing tags such as <br />
    
    0 讨论(0)
  • 2020-12-04 13:36

    you need to 'delimit' the regex with some character or another. Try this:

    if(!preg_match('#(?<=<)\w+(?=[^<]*?>)#', $string)){ 
        return $string;
    }
    
    0 讨论(0)
  • 2020-12-04 13:42

    A simple solution is:

    if($string != strip_tags($string)) {
        // contains HTML
    }
    

    The benefit of this over a regex is it's easier to understand, however I could not comment on the speed of execution of either solution.

    0 讨论(0)
  • 2020-12-04 13:45

    If purpose is just to check if string contain html tag or not. No matter html tags are valid or not. Then you can try this.

    function is_html($string) {
      // Check if string contains any html tags.
      return preg_match('/<\s?[^\>]*\/?\s?>/i', $string);
    }
    

    This works for all valid or invalid html tags. You can check confirm here https://regex101.com/r/2g7Fx4/3

    0 讨论(0)
提交回复
热议问题