urllib2

urrlib2.urlopen: “Name or service not known” persists when starting script without internet connection

浪尽此生 提交于 2019-12-17 19:13:33
问题 I have this simple minimal 'working' example below that opens a connection to google every two seconds. When I run this script when I have a working internet connection, I get the Success message, and when I then disconnect, I get the Fail message and when I reconnect again I get the Success again. So far, so good. However, when I start the script when the internet is disconnected, I get the Fail messages, and when I connect later, I never get the Success message. I keep getting the error:

Downloading a LOT of files using python

筅森魡賤 提交于 2019-12-17 18:42:13
问题 Is there a good way to download a lot of files en masse using python? This code is speedy enough for downloading about 100 or so files. But I need to download 300,000 files. Obviously they are all very small files (or I wouldn't be downloading 300,000 of them :) ) so the real bottleneck seems to be this loop. Does anyone have any thoughts? Maybe using MPI or threading? Do I just have to live with the bottle neck? Or is there a faster way, maybe not even using python? (I included the full

Python's `urllib2`: Why do I get error 403 when I `urlopen` a Wikipedia page?

不羁岁月 提交于 2019-12-17 17:24:41
问题 I have a strange bug when trying to urlopen a certain page from Wikipedia. This is the page: http://en.wikipedia.org/wiki/OpenCola_(drink) This is the shell session: >>> f = urllib2.urlopen('http://en.wikipedia.org/wiki/OpenCola_(drink)') Traceback (most recent call last): File "C:\Program Files\Wing IDE 4.0\src\debug\tserver\_sandbox.py", line 1, in <module> # Used internally for debug sandbox under external interpreter File "c:\Python26\Lib\urllib2.py", line 126, in urlopen return _opener

How to save “complete webpage” not just basic html using Python

半腔热情 提交于 2019-12-17 09:33:42
问题 I am using following code to save webpage using Python: import urllib import sys from bs4 import BeautifulSoup url = 'http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html' f = urllib.urlretrieve(url,'test.html') Problem : This code saves html as basic html without javascripts, images etc. I want to save webpage as complete (Like we have option in browser) Update : I am using following code now to save all the js/images/css files of webapge so that it can be saved as complete

Python: download files from google drive using url

断了今生、忘了曾经 提交于 2019-12-17 08:22:13
问题 I am trying to download files from google drive and all I have is the drive's url. I have read about google api that talks about some drive_service and MedioIO, which also requires some credentials( mainly json file/oauth). But I am unable to get any idea about how its working. Also, tried urllib2 urlretrieve, but my case is to get files from drive. Tried 'wget' too but no use. Tried pydrive library. It has good upload functions to drive but no download options. Any help will be appreciated.

how to follow meta refreshes in Python

跟風遠走 提交于 2019-12-17 07:37:18
问题 Python's urllib2 follows 3xx redirects to get the final content. Is there a way to make urllib2 (or some other library such as httplib2) also follow meta refreshes? Or do I need to parse the HTML manually for the refresh meta tags? 回答1: Here is a solution using BeautifulSoup and httplib2 (and certificate based authentication): import BeautifulSoup import httplib2 def meta_redirect(content): soup = BeautifulSoup.BeautifulSoup(content) result=soup.find("meta",attrs={"http-equiv":"Refresh"}) if

urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>

纵然是瞬间 提交于 2019-12-17 07:30:53
问题 If I run: urllib2.urlopen('http://google.com') even if I use another url, I get the same error. I'm pretty sure there is no firewall running on my computer or router, and the internet (from a browser) works fine. 回答1: The problem, in my case, was that some install at some point defined an environment variable http_proxy on my machine when I had no proxy. Removing the http_proxy environment variable fixed the problem. 回答2: The site's DNS record is such that Python fails the DNS lookup in a

urllib2 HTTP Error 400: Bad Request

廉价感情. 提交于 2019-12-17 07:24:38
问题 I have a piece of code like this host = 'http://www.bing.com/search?q=%s&go=&qs=n&sk=&sc=8-13&first=%s' % (query, page) req = urllib2.Request(host) req.add_header('User-Agent', User_Agent) response = urllib2.urlopen(req) and when I input a query greater than one word like "the dog" i get the following error. response = urllib2.urlopen(req) File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 400, in open

Tell urllib2 to use custom DNS

六月ゝ 毕业季﹏ 提交于 2019-12-17 07:15:18
问题 I'd like to tell urllib2.urlopen (or a custom opener ) to use 127.0.0.1 (or ::1 ) to resolve addresses. I wouldn't change my /etc/resolv.conf , however. One possible solution is to use a tool like dnspython to query addresses and httplib to build a custom url opener. I'd prefer telling urlopen to use a custom nameserver though. Any suggestions? 回答1: Looks like name resolution is ultimately handled by socket.create_connection . -> urllib2.urlopen -> httplib.HTTPConnection -> socket.create

Tell urllib2 to use custom DNS

会有一股神秘感。 提交于 2019-12-17 07:15:02
问题 I'd like to tell urllib2.urlopen (or a custom opener ) to use 127.0.0.1 (or ::1 ) to resolve addresses. I wouldn't change my /etc/resolv.conf , however. One possible solution is to use a tool like dnspython to query addresses and httplib to build a custom url opener. I'd prefer telling urlopen to use a custom nameserver though. Any suggestions? 回答1: Looks like name resolution is ultimately handled by socket.create_connection . -> urllib2.urlopen -> httplib.HTTPConnection -> socket.create