Use PhantomJS to extract html and text

前端未结
关注
 4  736
眼角桃花 2020-12-21 23:11
I try to extract all the text content of a page (because it doesn\'t work with Simpledomparser)
I try to modify this simple example from the manual

      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   甜味超标
                                             
                
                
                (楼主)
            
              
              
                2020-12-21 23:32
              

            
            
                        
Having encountered this question while trying to solve a similar problem, I ended up adapting a solution from this question like so:

var fs = require('fs');
var file_h = fs.open('header.html', 'r');
var line = file_h.readLine();
var header = "";

while(!file_h.atEnd()) {

    line = file_h.readLine(); 
    header += line;

}
console.log(header);

file_h.close();
phantom.exit();


This gave me a string with the read-in HTML file that was sufficient for my purposes, and hopefully may help others who came across this. 

The question seemed ambiguous (was it the entire content of the file required, or just the "text" aka Strings?) so this is one possible solution. 
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复