urllib2

How to check if the url redirect to another url using Python

淺唱寂寞╮ 提交于 2019-12-11 06:14:01
问题 I want to check whether the target url will be redirected after visiting. I thought I could do something like this: req = urllib2.Request(url=url, headers=headers) resp = urllib2.urlopen(req, timeout=3) code = resp.code if code == '200': # valid else: # not valid But it does not work since even if the url redirects, I still get 200. Can anyone help me with this plz? 回答1: Just to elaborate on my comment: req = urllib2.Request(url=url, headers=headers) resp = urllib2.urlopen(req, timeout=3)

Threading HTTP requests (with proxies)

断了今生、忘了曾经 提交于 2019-12-11 05:48:45
问题 I've looked at similar questions, but there always seems to be a whole lot of disagreement over the best way to handle threading with HTTP. What I specifically want to do: I'm using Python 2.7, and I want to try and thread HTTP requests (specifically, POSTing something), with a SOCKS5 proxy for each. The code I have already works, but is rather slow since it's waiting for each request (to the proxy server, then the web server) to finish before starting another. Each thread would most likely

Urlopen [Errno -2] Python

假如想象 提交于 2019-12-11 05:48:19
问题 I have a developed a part of code which I use from web scraping: link = 'http://www.cmegroup.com'+div.findAll('a')[3]['href'] user_agent = 'Mozilla/5.0' headers = {'User-Agent':user_agent} req = urllib2.Request(link, headers=headers) page = urllib2.urlopen(req).read() However what I don't understand is sometimes I get an error requesting the link. But sometimes, I don't. For example, the error: urllib2.URLError: <urlopen error [Errno -2] Name or service not known> came out for this link: http

urllib2 catches 404 error while URL exists

半城伤御伤魂 提交于 2019-12-11 05:42:37
问题 I faced with strange bug: urllib2 catches 404 error, while openning a valid url. I tryed it in browser, the url can be opened. Also I pass user-agent. import urllib.request as urllib2 uri = 'https://i.ytimg.com/vi/8Sii8G5CNvY/hqdefault.jpg?custom=true&w=196&h=110&stc=true&jpg444=true&jpgq=90&sp=68&sigh=OIIIAPOKNtx1OiZbAqdORlzl92g' try: req = urllib2.Request(uri, headers={ 'User-Agent': 'Mozilla/5.0' }) file = urllib2.urlopen(req) except urllib2.HTTPError as err: if err.code == 404: return

Website is up and running but parsing it results in HTTP Error 503

狂风中的少年 提交于 2019-12-11 05:38:06
问题 I want to crawl a webpage using urllib2 library and extract some information according to my need. I am able to freely navigate the site(going from one link to another and so on), but when I try to parse-it I am getting an error HTTP Error 503 : Service Temporarily Unavailable I searched about it on net and found out that this error occurs when "web site's server is not available at that time" I am confused after reading this, if website server is down then how come its up and running(since I

How do I insert a row in my google fusion table using Python

帅比萌擦擦* 提交于 2019-12-11 05:23:58
问题 I am working on a project and part of it involves inserting rows in to a Google Fusion Table for the Project from a python script. I have spent the last couple days trying to figure out just how to do that and I am officially confused. My research seems to indicate that I need to use Oauth 2.0 to access the API. In doing so I can successfully get an access token but I can't seem to successfully get a refresh token. I'm not sure if this is going to hamper my ability to successfully integrate

urllib2.URLError: urlopen error no host given

徘徊边缘 提交于 2019-12-11 04:55:28
问题 According to this code below, I saved the pull request number in a text file and I want to upload them to the url that is in my code but I got the error mentioned in the title. import urllib2 import json import httplib def event_spider(org,repo): try: nbPrequest_reopened=0 #number of pull requests reopened pages=1 while pages<=3: headers={'User-Agent':'Mozilla/5.0(X11;Linux i686)', 'Authorization':'token 516ed78e0521c6b25d9726ad51fa63841d019936',} read_file=open('C:\Python27\pullRequest

python urllib2 can't get google url

China☆狼群 提交于 2019-12-11 04:41:23
问题 I'm having a really tough time with getting the results page of this url with python's urllib2: http://www.google.com/search?tbs=sbi:AMhZZitAaz7goe6AsfVSmFw1sbwsmX0uIjeVnzKHjEXMck70H3j32Q-6FApxrhxdSyMo0OedyWkxk3-qYbyf0q1OqNspjLu8DlyNnWVbNjiKGo87QUjQHf2_1idZ1q_1vvm5gzOCMpChYiKsKYdMywOLjJzqmzYoJNOU2UsTs_1zZGWjU-LsjdFXt_1D5bDkuyRK0YbsaLVcx4eEk_1KMkcJpWlfFEfPMutxTLGf1zxD-9DFZDzNOODs0oj2j_1KG8FRCaMFnTzAfTdl7JfgaDf_1t5Vti8FnbeG9i7qt9wF6P-QK9mdvC15hZ5UR29eQdYbcD1e4woaOQCmg8Q1VLVPf4-kf8dAI7p3jM

Does urllib or urllib2 in Python 2.5 support https?

久未见 提交于 2019-12-11 03:56:14
问题 Thanks for the help in advance. I am puzzled that the same code works for python 2.6 but not 2.5. Here is the code import cgi, urllib, urlparse, urllib2 url='https://graph.facebook.com' req=urllib2.Request(url=url) p=urllib2.urlopen(req) response = cgi.parse_qs(p.read()) And here is the exception I got Traceback (most recent call last): File "t2.py", line 6, in <module> p=urllib2.urlopen(req) File "/home/userx/lib/python2.5/urllib2.py", line 124, in urlopen return _opener.open(url, data) File

Python Link to File Iterator not Iterating

女生的网名这么多〃 提交于 2019-12-11 02:55:57
问题 This one has had me stumped for a couple of days now and I believe I've finally narrowed it down to this block of code. If anyone can tell me how to fix this, and why it is happening it would be awesome. import urllib2 GetLink = 'http://somesite.com/search?q=datadata#page' holder = range(1,3) for LinkIncrement in holder: h = GetLink + str(LinkIncrement) ReadLink = urllib2.urlopen(h) f = open('test.txt', 'w') for line in ReadLink: f.write(line) f.close() main() #calls function main that does