How to scrape specific data from scrape with simple html dom parser

前端未结

关注

 6  1032

I am trying to scrape the datas from a webpage, but I get need to get all the data in this link.

include \'simple_html_dom.php\';
$html1 = file_get_html(\'ht


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2020-12-18 01:50
              
            
            
                                                                       
From what i can quickly glance you need to loop through the <dl> tags in #content, then the dt and dd.

foreach ($html->find('#content dl') as $item) {
     $info = $item->find('dd');
     foreach ($info as $info_item) {..}
}


Using the simple_html_dom library
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  既然无缘        
                
              
                            
                2020-12-18 01:57
              
            
            
                                                                       
@zero: there is good site to try out scrapping a site using both php and python...pretty helpful site atleast to me:- 
http://scraperwiki.com/
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  余生分开走        
                
              
                            
                2020-12-18 02:03
              
            
            
                                                                       
Seems to be written in the documentation:

$html1->find('b[class=info]',0)->innertext;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2020-12-18 02:08
              
            
            
                                                                       
XPath makes scraping ridiculously easy, and allows for some changes in the HTML document to not affect you. For example, to pull out the names, you'd use a query that looks like:

//div[id='content']/d1/dt


A simple Google search will give you plenty of tutorials
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人共我        
                
              
                            
                2020-12-18 02:09
              
            
            
                                                                       
I'd use WWW:Mechanize

http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  谎友^        
                
              
                            
                2020-12-18 02:17
              
            
            
                                                                       
Your provided links are down,
I will suggest you to use the native PHP "DOM" Extension instead of "simple html parser", it will be much faster and easier ;)
I had a look at the page using googlecache, you can use something like:-

$doc = new DOMDocument;
@$doc->loadHTMLFile('...URL....'); // Using the @ operator to hide parse errors
$contents = $doc->getElementById('content')->nodeValue; // Text contents of #content

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复