How to replace text over multiple lines using preg_replace

前端未结

关注

 4  508

Hi have the following content within an html page that stretches multiple lines


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  别跟我提以往        
                
              
                            
                2020-12-09 18:23
              
            
            
                                                                       
you can also use [\s\S] instead of . combined with the DOTALL flag s for matching everyting because [\s\S] means exactly the same: match everything; \s matches all space-characters (including newline) and \S machtes everything that is not a space-character (i.e. everything else). in some cases/implementations of regular expressions, this works better than enabling DOTALL

caution: .* with the flag for DOTALL as well as [\s\S] are both "hungry" and won't stop reading the string. if you want them to stop at a certain position, (e.g. the first </div>), use the non-greedy operator ? behind your quantifier, e.g. .*?
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  误落风尘        
                
              
                            
                2020-12-09 18:26
              
            
            
                                                                       
It is possible to use regex to strip out chunks of html data, but you need to wrap the html with custom html tags which get ignored by browsers. For example:

<?php
$html='
<div>This will be shown</div>
<custom650 rel="nofollow">
  <p class="subformedit">
    <a href="#" class="mylink">Link</a>
    <div class="morestuff">
      ... more html in here ...
    </div>
  </p>
</custom650>
<div>This will also be shown</div>
';


To strip the tags with the rel="nofollow" attributes, you can use the following regex:

$newhtml = preg_replace('/<([^\s]+)[^>]*rel="nofollow"[^>]*>.*?<\/\1>/si', '', $html);


From experience, start the custom tags on a new line. Undoubtedly a hack, but might help someone.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  自闭症患者        
                
              
                            
                2020-12-09 18:38
              
            
            
                                                                       
it is the "s" flag, it enables . to capture newlines
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  走了就别回头了        
                
              
                            
                2020-12-09 18:43
              
            
            
                                                                       
If this weren't HTML, I'd tell you to use the DOTALL modifier to change the meaning of . from 'match everything except new line' to 'match everything':

preg_replace('/(.*)<\/div>/s','abc',$body);


But this is HTML, so use an HTML parser instead.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复