Use PhantomJS to extract html and text

前端未结
关注
 4  744
眼角桃花 2020-12-21 23:11
I try to extract all the text content of a page (because it doesn\'t work with Simpledomparser)
I try to modify this simple example from the manual

      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   刺人心
                                             
                
                
                (楼主)
            
              
              
                2020-12-21 23:42
              

            
            
                        
There are multiple ways to retrieve the page content as a string:


page.content gives the complete source including the markup () and doctype (),
document.documentElement.outerHTML (via page.evaluate) gives the complete source including the markup (), but without doctype,
document.documentElement.textContent (via page.evaluate) gives the cumulative text content of the complete document including inline CSS & JavaScript, but without markup,
document.documentElement.innerText (via page.evaluate) gives the cumulative text content of the complete document excluding inline CSS & JavaScript and without markup.


document.documentElement can be exchanged by an element or query of your choice.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复