urllib2 | 易学教程

urrlib2.urlopen: “Name or service not known” persists when starting script without internet connection

阅读更多关于 urrlib2.urlopen: “Name or service not known” persists when starting script without internet connection

问题 I have this simple minimal 'working' example below that opens a connection to google every two seconds. When I run this script when I have a working internet connection, I get the Success message, and when I then disconnect, I get the Fail message and when I reconnect again I get the Success again. So far, so good. However, when I start the script when the internet is disconnected, I get the Fail messages, and when I connect later, I never get the Success message. I keep getting the error:

Downloading a LOT of files using python

阅读更多关于 Downloading a LOT of files using python

问题 Is there a good way to download a lot of files en masse using python? This code is speedy enough for downloading about 100 or so files. But I need to download 300,000 files. Obviously they are all very small files (or I wouldn't be downloading 300,000 of them :) ) so the real bottleneck seems to be this loop. Does anyone have any thoughts? Maybe using MPI or threading? Do I just have to live with the bottle neck? Or is there a faster way, maybe not even using python? (I included the full

Python's `urllib2`: Why do I get error 403 when I `urlopen` a Wikipedia page?

阅读更多关于 Python's `urllib2`: Why do I get error 403 when I `urlopen` a Wikipedia page?

问题 I have a strange bug when trying to urlopen a certain page from Wikipedia. This is the page: http://en.wikipedia.org/wiki/OpenCola_(drink) This is the shell session: >>> f = urllib2.urlopen('http://en.wikipedia.org/wiki/OpenCola_(drink)') Traceback (most recent call last): File "C:\Program Files\Wing IDE 4.0\src\debug\tserver\_sandbox.py", line 1, in <module> # Used internally for debug sandbox under external interpreter File "c:\Python26\Lib\urllib2.py", line 126, in urlopen return _opener

How to save “complete webpage” not just basic html using Python

阅读更多关于 How to save “complete webpage” not just basic html using Python

问题 I am using following code to save webpage using Python: import urllib import sys from bs4 import BeautifulSoup url = 'http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html' f = urllib.urlretrieve(url,'test.html') Problem : This code saves html as basic html without javascripts, images etc. I want to save webpage as complete (Like we have option in browser) Update : I am using following code now to save all the js/images/css files of webapge so that it can be saved as complete

Python: download files from google drive using url

阅读更多关于 Python: download files from google drive using url

问题 I am trying to download files from google drive and all I have is the drive's url. I have read about google api that talks about some drive_service and MedioIO, which also requires some credentials( mainly json file/oauth). But I am unable to get any idea about how its working. Also, tried urllib2 urlretrieve, but my case is to get files from drive. Tried 'wget' too but no use. Tried pydrive library. It has good upload functions to drive but no download options. Any help will be appreciated.

how to follow meta refreshes in Python

阅读更多关于 how to follow meta refreshes in Python

问题 Python's urllib2 follows 3xx redirects to get the final content. Is there a way to make urllib2 (or some other library such as httplib2) also follow meta refreshes? Or do I need to parse the HTML manually for the refresh meta tags? 回答1: Here is a solution using BeautifulSoup and httplib2 (and certificate based authentication): import BeautifulSoup import httplib2 def meta_redirect(content): soup = BeautifulSoup.BeautifulSoup(content) result=soup.find("meta",attrs={"http-equiv":"Refresh"}) if

urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>

阅读更多关于 urllib2.URLError:

问题 If I run: urllib2.urlopen('http://google.com') even if I use another url, I get the same error. I'm pretty sure there is no firewall running on my computer or router, and the internet (from a browser) works fine. 回答1: The problem, in my case, was that some install at some point defined an environment variable http_proxy on my machine when I had no proxy. Removing the http_proxy environment variable fixed the problem. 回答2: The site's DNS record is such that Python fails the DNS lookup in a

urllib2 HTTP Error 400: Bad Request

阅读更多关于 urllib2 HTTP Error 400: Bad Request

问题 I have a piece of code like this host = 'http://www.bing.com/search?q=%s&go=&qs=n&sk=&sc=8-13&first=%s' % (query, page) req = urllib2.Request(host) req.add_header('User-Agent', User_Agent) response = urllib2.urlopen(req) and when I input a query greater than one word like "the dog" i get the following error. response = urllib2.urlopen(req) File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 400, in open

Tell urllib2 to use custom DNS

阅读更多关于 Tell urllib2 to use custom DNS

问题 I'd like to tell urllib2.urlopen (or a custom opener ) to use 127.0.0.1 (or ::1 ) to resolve addresses. I wouldn't change my /etc/resolv.conf , however. One possible solution is to use a tool like dnspython to query addresses and httplib to build a custom url opener. I'd prefer telling urlopen to use a custom nameserver though. Any suggestions? 回答1: Looks like name resolution is ultimately handled by socket.create_connection . -> urllib2.urlopen -> httplib.HTTPConnection -> socket.create

Tell urllib2 to use custom DNS

阅读更多关于 Tell urllib2 to use custom DNS