urllib2

label empty or too long - python urllib2

夙愿已清 提交于 2019-12-01 15:21:56
I am having a strange situation: i am curling urls like this: def check_urlstatus(url): h = httplib2.Http() try: resp = h.request("http://" + url, 'HEAD') if int(resp[0]['status']) < 400: return 'ok' else: return 'bad' except httplib2.ServerNotFoundError: return 'bad' if I try to test this with: if check_urlstatus('.f.de') == "bad": #<--- error happening here #.. #.. it is saying: UnicodeError: label empty or too long what is the problem i am causing here? EDIT : here is the traceback with idna. I guess, it tries to split the input by . and in this case, first label is empty which is the pace

Python urllib urllib2

巧了我就是萌 提交于 2019-12-01 15:13:42
urlli2是对urllib的扩展。 相似与区别: 最常用的urllib.urlopen和urllib2.urlopen是类似的,但是参数有区别,例如超时和代理。 urllib接受url字符串来获取信息,而urllib2除了url字符串,也接受Request对象,而在Request对象中可以设置headers,而urllib却不能设置headers。 urllib有urlencode方法来对参数进行encode操作,而urllib2没有此方法,所以他们两经常一起使用。 相对来说urllib2功能更多一些,包含了各种handler和opener。 另外还有httplib模块,它提供了最基础的http请求的方法,例如可以做get/post/put等操作。 参考: http://blog.csdn.net/column/details/why-bug.html 最基本的应用: import urllib2 response = urllib2.urlopen('http://www.baidu.com/') html = response.read() print html 使用Request对象: import urllib2 req = urllib2.Request('http://www.baidu.com') response = urllib2.urlopen(req)

python常用函数以及模块(二)

橙三吉。 提交于 2019-12-01 15:13:29
随机数模块。 import random arr = [1, 2, 3, 4] print random.random() #返回0.0~1.0的随机数 print random.uniform(0.5, 0.8) #范围内随机数 print random.randint(1, 5) #范围随机整数 print random.choice(arr) #数组,元组中随机一个数 print random.sample(arr, 2) #数组,元组中随机两个数 random.shuffle(arr) #打乱数组,不能用在元组上 print arr #result: 0.504674499914 0.79300720529 1 1 [1, 4] [3, 2, 4, 1] 2. 数据对象持久化。 try: import pickle except: import CPickle as pickle # CPickle是pickle的c语言版,更快 arr = [123, [1, 2, 3], {'a':'A'}, (1, 2, 3)] pack = pickle.dumps(arr) unpack = pickle.loads(pack) 3. csv模块。 import csv with open('a.csv', 'wb') as fp: writer = csv.writer(fp)

python urllib2 开启调试

核能气质少年 提交于 2019-12-01 15:13:15
发一段在网上看见. USING HTTPLIB.HTTPCONNECTION.SET_DEBUGLEVEL() WITH URLLIB2 Posted on October 1, 2007, 9:52 pm, by jamiegrove, under python . I’ve been trying to get the debug level turned on in urllib2 for about an hour and now that it is working I thought I would post what I found… When using urllib, you can set the debuglevel directly by using something like this: import urllib, httplib httplib.HTTPConnection.debuglevel = 1 urllib.urlopen(“http://www.somesite.com”) However, when using urllib2 you need to create a handler install it for use. The sample below creates the lovely debuglevel handler.

How to set TCP_NODELAY flag when loading URL with urllib2?

余生颓废 提交于 2019-12-01 15:12:12
I am using urllib2 for loading web-page, my code is: httpRequest = urllib2.Request("http:/www....com") pageContent = urllib2.urlopen(httpRequest) pageContent.readline() How can I get hold of the socket properties to set TCP_NODELAY ? In normal socket I would be using function: socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) If you need to access to such low level property on the socket used, you'll have to overload some objects. First, you'll need to create a subclass of HTTPHandler , that in the standard library do : class HTTPHandler(AbstractHTTPHandler): def http_open(self, req

Repeated host lookups failing in urllib2

半世苍凉 提交于 2019-12-01 14:22:38
I have code which issues many HTTP GET requests using Python's urllib2, in several threads, writing the responses into files (one per thread). During execution, it looks like many of the host lookups fail (causing a name or service unknown error, see appended error log for an example). Is this due to a flaky DNS service? Is it bad practice to rely on DNS caching, if the host name isn't changing? I.e. should a single lookup's result be passed into the urlopen ? Exception in thread Thread-16: Traceback (most recent call last): File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap

Python urllib2: Cannot assign requested address

安稳与你 提交于 2019-12-01 13:53:42
I am sending thousands of requests using urllib2 with proxies. I have received many of the following error on execution: urlopen error [Errno 99] Cannot assign requested address I read here that it may be due to a socket already being bonded. Is that the case? Any suggestions on how to fix this? mhawke Here is an answer to a similar looking question that I prepared earlier.... much earlier... Socket in use error when reusing sockets The error is different, but the underlying problem is probably the same: you are consuming all available ports and trying to reuse them before the TIME_WAIT state

python get headers only using urllib2

邮差的信 提交于 2019-12-01 13:25:26
I have to implement a function to get headers only (without doing a GET or POST) using urllib2. Here is my function: def getheadersonly(url, redirections = True): if not redirections: class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler): def http_error_302(self, req, fp, code, msg, headers): return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers) http_error_301 = http_error_303 = http_error_307 = http_error_302 cookieprocessor = urllib2.HTTPCookieProcessor() opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor) urllib2.install_opener(opener)

Fetching a URL from a basic-auth protected Jenkins server with urllib2

穿精又带淫゛_ 提交于 2019-12-01 13:16:00
I'm trying to fetch a URL from a Jekins server. Until somewhat recently I was able to use the pattern described on this page ( HOWTO Fetch Internet Resources Using urllib2 ) to create a password-manager that correctly responded to BasicAuth challenges with the user-name & password. All was fine until the Jenkins team changed their security model , and that code no longer worked. # DOES NOT WORK! import urllib2 password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() top_level_url = "http://localhost:8080" password_mgr.add_password(None, top_level_url, 'sal', 'foobar') handler = urllib2

Repeated host lookups failing in urllib2

大城市里の小女人 提交于 2019-12-01 13:12:05
问题 I have code which issues many HTTP GET requests using Python's urllib2, in several threads, writing the responses into files (one per thread). During execution, it looks like many of the host lookups fail (causing a name or service unknown error, see appended error log for an example). Is this due to a flaky DNS service? Is it bad practice to rely on DNS caching, if the host name isn't changing? I.e. should a single lookup's result be passed into the urlopen ? Exception in thread Thread-16: