What should I use to open a url instead of urlopen in urllib3

前端未结

关注

 5  862

I wanted to write a piece of code like the following:

from bs4 import BeautifulSoup
import urllib2

url = \'http://www.thefamouspeople.com/singers.php\'
html = u


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  北恋        
                
              
                            
                2021-01-30 08:50
              
            
            
                                                                       
In urlip3 there's no .urlopen, instead try this:
import requests
html = requests.get(url)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  误落风尘        
                
              
                            
                2021-01-30 08:58
              
            
            
                                                                       
The new urllib3 library has a nice documentation here

In order to get your desired result you shuld follow that:

Import urllib3
from bs4 import BeautifulSoup

url = 'http://www.thefamouspeople.com/singers.php'

http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))


The "decode utf-8" part is optional. It worked without it when i tried, but i posted the option anyway.

Source: User Guide
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  粉色の甜心        
                
              
                            
                2021-01-30 09:01
              
            
            
                                                                       
With gazpacho you could pipeline the page straight into a parse-able soup object:
from gazpacho import Soup
url = "http://www.thefamouspeople.com/singers.php"
soup = Soup.get(url)

And run finds on top of it:
soup.find("div")

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2021-01-30 09:03
              
            
            
                                                                       
urllib3 is a different library from urllib and urllib2. It has lots of additional features to the urllibs in the standard library, if you need them, things like re-using connections. The documentation is here: https://urllib3.readthedocs.org/

If you'd like to use urllib3, you'll need to pip install urllib3. A basic example looks like this:

from bs4 import BeautifulSoup
import urllib3

http = urllib3.PoolManager()

url = 'http://www.thefamouspeople.com/singers.php'
response = http.request('GET', url)
soup = BeautifulSoup(response.data)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  闹比i        
                
              
                            
                2021-01-30 09:04
              
            
            
                                                                       
You do not have to install urllib3. You can choose any HTTP-request-making library that fits your needs and feed the response to BeautifulSoup. The choice is though usually requests because of the rich feature set and convenient API. You can install requests by entering pip install requests in the command line. Here is a basic example:

from bs4 import BeautifulSoup
import requests

url = "url"
response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复