urllib2

Windows Authentication with Python and urllib2

こ雲淡風輕ζ 提交于 2019-11-30 03:41:36
I want to grab some data off a webpage that requires my windows username and password. So far, I've got: opener = build_opener() try: page = opener.open("http://somepagewhichneedsmywindowsusernameandpassword/") print page except URLError: print "Oh noes." Is this supported by urllib2? I've found Python NTLM , but that requires me to put my username and password in. Is there any way to just grab the authentication information somehow (e.g. like IE does, or Firefox, if I changed the network.automatic-ntlm-auth.trusted-uris settings). Edit after msander's answer So I've now got this: # Send a

How to read image from in memory buffer (StringIO) or from url with opencv python library

断了今生、忘了曾经 提交于 2019-11-30 03:15:15
Just share a way to create opencv image object from in memory buffer or from url to improve performance. Sometimes we get image binary from url, to avoid additional file IO, we want to imread this image from in memory buffer or from url, but imread only supports read image from file system with path. To create an OpenCV image object with in memory buffer(StringIO), we can use OpenCV API imdecode, see code below: import cv2 import numpy as np from urllib2 import urlopen from cStringIO import StringIO def create_opencv_image_from_stringio(img_stream, cv2_img_flag=0): img_stream.seek(0) img_array

How can I get the final redirect URL when using urllib2.urlopen?

家住魔仙堡 提交于 2019-11-30 01:46:31
I'm using the urllib2.urlopen method to open a URL and fetch the markup of a webpage. Some of these sites redirect me using the 301/302 redirects. I would like to know the final URL that I've been redirected to. How can I get this? Mark Call the .geturl() method of the file object returned. Per the urllib2 docs : geturl() — return the URL of the resource retrieved, commonly used to determine if a redirect was followed Example: import urllib2 response = urllib2.urlopen('http://tinyurl.com/5b2su2') response.geturl() # 'http://stackoverflow.com/' The return value of urllib2.urlopen has a geturl()

urllib.quote() throws KeyError

和自甴很熟 提交于 2019-11-30 01:17:22
To encode the URI, I used urllib.quote("schönefeld") but when some non-ascii characters exists in string, it thorws KeyError: u'\xe9' Code: return ''.join(map(quoter, s)) My input strings are köln, brønshøj, schönefeld etc. When I tried just printing statements in windows(Using python2.7, pyscripter IDE). But in linux it raises exception (I guess platform doesn't matter). This is what I am trying: from commands import getstatusoutput queryParams = "schönefeld"; cmdString = "http://baseurl" + quote(queryParams) print getstatusoutput(cmdString) Exploring the issue reason: in urllib.quote() ,

How to “keep-alive” with cookielib and httplib in python?

空扰寡人 提交于 2019-11-29 23:54:36
问题 In python, I'm using httplib because it "keep-alive" the http connection (as oppose to urllib(2)). Now, I want to use cookielib with httplib but they seem to hate each other!! (no way to interface them together). Does anyone know of a solution to that problem? 回答1: HTTP handler for urllib2 that supports keep-alive 回答2: You should consider using the Requests library instead at the earliest chance you have to refactor your code. In the mean time; HACK ALERT! :) I'd go other suggested way, but I

Python/Django download Image from URL, modify, and save to ImageField

ⅰ亾dé卋堺 提交于 2019-11-29 22:40:13
I've been looking for a way to download an image from a URL, preform some image manipulations (resize) actions on it, and then save it to a django ImageField. Using the two great posts (linked below), I have been able to download and save an image to an ImageField. However, I've been having some trouble manipulating the file once I have it. Specifically, the model field save() method requires a File() object as the second parameter. So my data has to eventually be a File() object. The blog posts linked below show how to use urllib2 to save your an image URL into a File() object. This is great,

Python常见面试题四:爬虫和数据库部分

被刻印的时光 ゝ 提交于 2019-11-29 21:24:46
目录 1. scrapy 和 scrapy-redis 有什么区别?为什么选择 redis 数据库? 2. 用过的爬虫框架或者模块有哪些?谈谈他们的区别或者优缺点? 3. 常用的 mysql 引擎有哪些?各引擎间有什么区别? 4. 描述下 scrapy 框架运行的机制? 5. 什么是关联查询,有哪些? 6. 写爬虫是用多进程好?还是多线程好? 为什么? 7. 数据库的优化? 8. 常见的反爬虫和应对方法? 9. 分布式爬虫主要解决什么问题? 10. 爬虫过程中验证码怎么处理? 1. scrapy 和 scrapy-redis 有什么区别?为什么选择 redis 数据库? 1) scrapy 是一个 Python 爬虫框架,爬取效率极高,具有高度定制性,但是不支持分布式。而 scrapy-redis 一套基于 redis 数据库、运行在 scrapy 框架之上的组件,可以让 scrapy 支持分布式策略,Slaver 端共享 Master 端 redis 数据库里的 item 队列、请求队列和请求指纹集合。 2) 为什么选择 redis 数据库,因为 redis 支持主从同步,而且数据都是缓存在内存中的,所以基于 redis 的分布式爬虫,对请求和数据的高频读取效率非常高。 2. 用过的爬虫框架或者模块有哪些?谈谈他们的区别或者优缺点? Python自带:urllib,urllib2

Need to install urllib2 for Python 3.5.1

笑着哭i 提交于 2019-11-29 21:16:47
I'm running Python 3.5.1 for Mac. I want to use urllib2. I tried installing that but I'm told that it's been split into urllib.request and urllib.error for Python 3. My command (running from the framework bin directory for now because it's not in my path): sudo ./pip3 install urllib.request Returns: Could not find a version that satisfies the requirement urllib.request (from versions: ) No matching distribution found for urllib.request I got the same error before when I tried to install urllib2 in one fell swoop. WARNING : Security researches have found several poisoned packages on PyPI ,

AttributeError(“'str' object has no attribute 'read'”)

怎甘沉沦 提交于 2019-11-29 19:07:17
In Python I'm getting an error: Exception: (<type 'exceptions.AttributeError'>, AttributeError("'str' object has no attribute 'read'",), <traceback object at 0x1543ab8>) Given python code: def getEntries (self, sub): url = 'http://www.reddit.com/' if (sub != ''): url += 'r/' + sub request = urllib2.Request (url + '.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'}) response = urllib2.urlopen (request) jsonofabitch = response.read () return json.load (jsonofabitch)['data']['children'] What does this error mean and what did I do to cause it? kosii The problem is that for

urllib2.urlopen cannot get image, but browser can

半腔热情 提交于 2019-11-29 17:28:07
There is a link with a gif image, but urllib2 can't download it. import urllib.request as urllib2 uri = 'http://ums.adtechjp.com/mapuser?providerid=1074;userid=AapfqIzytwl7ks8AA_qiU_BNUs8AAAFYqnZh4Q' try: req = urllib2.Request(uri, headers={ 'User-Agent': 'Mozilla/5.0' }) file = urllib2.urlopen(req) except urllib2.HTTPError as err: print('HTTP error!!!') file = err print(err.code) except urllib2.URLError as err: print('URL error!!!') print(err.reason) return data = file.read(1024) print(data) After script finishes, data remains empty. Why does it happen? There is no HTTPError, I can see in