urllib2 | 易学教程

How to check if the url redirect to another url using Python

阅读更多关于 How to check if the url redirect to another url using Python

问题 I want to check whether the target url will be redirected after visiting. I thought I could do something like this: req = urllib2.Request(url=url, headers=headers) resp = urllib2.urlopen(req, timeout=3) code = resp.code if code == '200': # valid else: # not valid But it does not work since even if the url redirects, I still get 200. Can anyone help me with this plz? 回答1: Just to elaborate on my comment: req = urllib2.Request(url=url, headers=headers) resp = urllib2.urlopen(req, timeout=3)

Threading HTTP requests (with proxies)

阅读更多关于 Threading HTTP requests (with proxies)

问题 I've looked at similar questions, but there always seems to be a whole lot of disagreement over the best way to handle threading with HTTP. What I specifically want to do: I'm using Python 2.7, and I want to try and thread HTTP requests (specifically, POSTing something), with a SOCKS5 proxy for each. The code I have already works, but is rather slow since it's waiting for each request (to the proxy server, then the web server) to finish before starting another. Each thread would most likely

Urlopen [Errno -2] Python

阅读更多关于 Urlopen [Errno -2] Python

问题 I have a developed a part of code which I use from web scraping: link = 'http://www.cmegroup.com'+div.findAll('a')[3]['href'] user_agent = 'Mozilla/5.0' headers = {'User-Agent':user_agent} req = urllib2.Request(link, headers=headers) page = urllib2.urlopen(req).read() However what I don't understand is sometimes I get an error requesting the link. But sometimes, I don't. For example, the error: urllib2.URLError: <urlopen error [Errno -2] Name or service not known> came out for this link: http

urllib2 catches 404 error while URL exists

阅读更多关于 urllib2 catches 404 error while URL exists

问题 I faced with strange bug: urllib2 catches 404 error, while openning a valid url. I tryed it in browser, the url can be opened. Also I pass user-agent. import urllib.request as urllib2 uri = 'https://i.ytimg.com/vi/8Sii8G5CNvY/hqdefault.jpg?custom=true&w=196&h=110&stc=true&jpg444=true&jpgq=90&sp=68&sigh=OIIIAPOKNtx1OiZbAqdORlzl92g' try: req = urllib2.Request(uri, headers={ 'User-Agent': 'Mozilla/5.0' }) file = urllib2.urlopen(req) except urllib2.HTTPError as err: if err.code == 404: return

Website is up and running but parsing it results in HTTP Error 503

阅读更多关于 Website is up and running but parsing it results in HTTP Error 503

问题 I want to crawl a webpage using urllib2 library and extract some information according to my need. I am able to freely navigate the site(going from one link to another and so on), but when I try to parse-it I am getting an error HTTP Error 503 : Service Temporarily Unavailable I searched about it on net and found out that this error occurs when "web site's server is not available at that time" I am confused after reading this, if website server is down then how come its up and running(since I

How do I insert a row in my google fusion table using Python

阅读更多关于 How do I insert a row in my google fusion table using Python

问题 I am working on a project and part of it involves inserting rows in to a Google Fusion Table for the Project from a python script. I have spent the last couple days trying to figure out just how to do that and I am officially confused. My research seems to indicate that I need to use Oauth 2.0 to access the API. In doing so I can successfully get an access token but I can't seem to successfully get a refresh token. I'm not sure if this is going to hamper my ability to successfully integrate

urllib2.URLError: urlopen error no host given

阅读更多关于 urllib2.URLError: urlopen error no host given

问题 According to this code below, I saved the pull request number in a text file and I want to upload them to the url that is in my code but I got the error mentioned in the title. import urllib2 import json import httplib def event_spider(org,repo): try: nbPrequest_reopened=0 #number of pull requests reopened pages=1 while pages<=3: headers={'User-Agent':'Mozilla/5.0(X11;Linux i686)', 'Authorization':'token 516ed78e0521c6b25d9726ad51fa63841d019936',} read_file=open('C:\Python27\pullRequest

python urllib2 can't get google url

阅读更多关于 python urllib2 can't get google url

问题 I'm having a really tough time with getting the results page of this url with python's urllib2: http://www.google.com/search?tbs=sbi:AMhZZitAaz7goe6AsfVSmFw1sbwsmX0uIjeVnzKHjEXMck70H3j32Q-6FApxrhxdSyMo0OedyWkxk3-qYbyf0q1OqNspjLu8DlyNnWVbNjiKGo87QUjQHf2_1idZ1q_1vvm5gzOCMpChYiKsKYdMywOLjJzqmzYoJNOU2UsTs_1zZGWjU-LsjdFXt_1D5bDkuyRK0YbsaLVcx4eEk_1KMkcJpWlfFEfPMutxTLGf1zxD-9DFZDzNOODs0oj2j_1KG8FRCaMFnTzAfTdl7JfgaDf_1t5Vti8FnbeG9i7qt9wF6P-QK9mdvC15hZ5UR29eQdYbcD1e4woaOQCmg8Q1VLVPf4-kf8dAI7p3jM

Does urllib or urllib2 in Python 2.5 support https?

阅读更多关于 Does urllib or urllib2 in Python 2.5 support https?

问题 Thanks for the help in advance. I am puzzled that the same code works for python 2.6 but not 2.5. Here is the code import cgi, urllib, urlparse, urllib2 url='https://graph.facebook.com' req=urllib2.Request(url=url) p=urllib2.urlopen(req) response = cgi.parse_qs(p.read()) And here is the exception I got Traceback (most recent call last): File "t2.py", line 6, in <module> p=urllib2.urlopen(req) File "/home/userx/lib/python2.5/urllib2.py", line 124, in urlopen return _opener.open(url, data) File

Python Link to File Iterator not Iterating

阅读更多关于 Python Link to File Iterator not Iterating

问题 This one has had me stumped for a couple of days now and I believe I've finally narrowed it down to this block of code. If anyone can tell me how to fix this, and why it is happening it would be awesome. import urllib2 GetLink = 'http://somesite.com/search?q=datadata#page' holder = range(1,3) for LinkIncrement in holder: h = GetLink + str(LinkIncrement) ReadLink = urllib2.urlopen(h) f = open('test.txt', 'w') for line in ReadLink: f.write(line) f.close() main() #calls function main that does