Removing string inside brackets

前端 未结 5 1645
渐次进展
渐次进展 2021-01-24 12:42

Good day!

I would like some help in removing strings inside the square brackets and including the square brackets.

The string looks like this:

$str

5条回答
  •  难免孤独
    2021-01-24 13:02

    Note: The OP has dramatically changed the question. This solution was designed to handle the question in its original (more difficult) form (before the "www.example.com" constraint was added.) Although the following solution has been modified to handle this additional constraint, a simpler solution would now probably suffice (i.e. anubhava's answer).

    Here is my tested solution:

    function strip_bracketed_special($text) {
        $re = '% # Remove bracketed text having "www.example.com" within markup.
              # Skip comments, CDATA, SCRIPT & STYLE elements, and HTML tags.
              (                      # $1: HTML stuff to be left alone.
                           # HTML comments (non-SGML compliant).
              |   # CDATA sections
              |   # SCRIPT elements.
              |     # STYLE elements.
              | <\w+                 # HTML element start tags.
                (?:                  # Group optional attributes.
                  \s+                # Attributes separated by whitespace.
                  [\w:.-]+           # Attribute name is required
                  (?:                # Group for optional attribute value.
                    \s*=\s*          # Name and value separated by "="
                    (?:              # Group for value alternatives.
                      "[^"]*"        # Either double quoted string,
                    | \'[^\']*\'     # or single quoted string,
                    | [\w:.-]+       # or un-quoted string (limited chars).
                    )                # End group of value alternatives.
                  )?                 # Attribute values are optional.
                )*                   # Zero or more start tag attributes.
                \s*/?>               # End of start tag (optional self-close).
              |                # HTML element end tags.
              )                      # End #1: HTML Stuff to be left alone.
            | # Or... Bracketed structures containing www.example.com
              \s*\[                  # (optional ws), Opening bracket.
              [^\]]*?                # Match up to required content.
              www\.example\.com      # Required bracketed content.
              [^\]]*                 # Match up to closing bracket.
              \]\s*                  # Closing bracket, (optional ws).
            %six';
        return preg_replace($re, '$1', $text);
    }
    

    Note that the regex skips removal of bracketed material from within: HTML comments, CDATA sections, SCRIPT and STYLE elements and from within HTML tag attribute values. Given the following XHTML markup (which tests these scenarios), the above function correctly removes only the bracketed contents within html element contents:

    
    
    
        Test special removal. [Remove this www.example.com]
        
        
        
    
    
    
    

    Test special removal. [Remove this www.example.com]

    Test special removal. [Remove this www.example.com]

    Test special removal. [Do not remove this] Test special removal. [Remove this www.example.com]

    Here is the same markup after being run through the PHP function above:

    
    
    
        Test special removal.
        
        
        
    
    
    
    

    Test special removal.

    Test special removal.

    Test special removal. [Do not remove this] Test special removal.

    This solution should work quite well for just about any valid (X)HTML you can throw at it. (But please, no funky shorttags or SGML comments!)

提交回复
热议问题