How to get the html source of a specific element with selenium?

前端未结

关注

 4  976

The page I\'m looking at contains :

  text 1 
 text 2 
 text 3  text 4 
  

        
                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  星月不相逢        
                
              
                            
                2020-12-29 15:08
              
            
            
                                                                       
What about using jQuery?

Edit:

First you have to add the required .JS files, for that go to www.jQuery.com.

Then all you need to do is call a simple jQuery selector:

alert($("div#1").html());

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  攒了一身酷        
                
              
                            
                2020-12-29 15:10
              
            
            
                                                                       
The following code will give you the HTML in the div element:

sel = selenium('localhost', 4444, browser, my_url)
html = sel.get_eval("this.browserbot.getCurrentWindow().document.getElementById('1').innerHTML")


then you can use BeautifulSoup to parse it and extract what you really want.

I hope it helps
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦毁少年i        
                
              
                            
                2020-12-29 15:14
              
            
            
                                                                       
The selected answer does not work in Python 3 at the time of writing. Instead use this:

from selenium import webdriver

wd = webdriver.Firefox()
wd.get(url)
return wd.execute_script('return window.document.getElementById('1').innerHTML')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  眼角桃花        
                
              
                            
                2020-12-29 15:23
              
            
            
                                                                       
Use xpath. From selenium.py:

Without an explicit locator prefix, Selenium uses the following default strategies:

\**dom**\ , for locators starting with "document."
\**xpath**\ , for locators starting with "//"
\**identifier**\ , otherwise


In your case, you could try
selenium.get_text("//div[@id='1']/descendant::*[not(self::h1)]")

You can learn more about xpath here.
P.S. I don't know if there's good HTML documentation available for python-selenium, but I haven't found any; on the other hand, the docstrings of the selenium.py file seem to constitute comprehensive documentation. So I'd suggest looking up the source to get a better understanding of how it works.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复