urllib2 | 易学教程

HTTP: Proxy Authentification Error for nltk.download()

阅读更多关于 HTTP: Proxy Authentification Error for nltk.download()

问题 I am using nltk.download() to download the packages i need. But i am getting the following error. root@nishant-Inspiron-1545:/home/nishant/Dropbox/DDP/data# python Python 2.7.3 (default, Apr 10 2013, 05:09:49) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import nltk >>> import nltk.downloader >>> nltk.download() NLTK Downloader --------------------------------------------------------------------------- d) Download l) List c) Config h) Help q

urllib2: submitting a form and then redirecting

阅读更多关于 urllib2: submitting a form and then redirecting

问题 My goal is to come up with a portable urllib2 solution that would POST a form and then redirect the user to what comes out. The POSTing part is simple: request = urllib2.Request('https://some.site/page', data=urllib.urlencode({'key':'value'})) response = urllib2.urlopen(request) Providing data sets request type to POST. Now, what I suspect all the data I should care about comes from response.info() & response.geturl() . I should do a self.redirect(response.geturl()) inside a get(self) method

corrupt zip download urllib2

阅读更多关于 corrupt zip download urllib2

问题 I am trying to download zip files from measuredhs.com using the following code: url ='https://dhsprogram.com/customcf/legacy/data/download_dataset.cfm?Filename=BFBR62DT.ZIP&Tp=1&Ctry_Code=BF' request = urllib2.urlopen(url) output = open("install.zip", "w") output.write(request.read()) output.close() However the downloaded file does not open. I get a message saying the compressed zip folder is invalid. To access the download link, one needs to long in, which I have done so. If i click on the

Python Mechanize to check if a server is available

阅读更多关于 Python Mechanize to check if a server is available

问题 I'm trying to write a script which will read a file containing some urls and then open a browser instance using mechanize module. I'm just wondering how I can do so if some url does not exist or if the server is unreachable. For Example import mechanize br = mechanize.Browser() b = br.open('http://192.168.1.30/index.php') What I want to know is how I will get information from mechanize if 192.168.1.30 is unreachable or if http returns 404 Error. 回答1: from mechanize import Browser browser =

Why this request doesn't work?

阅读更多关于 Why this request doesn't work?

问题 I want to make a simple stupid twitter app using Twitter API. If I request this page from my browser it does work: http://search.twitter.com/search.atom?q=hello&rpp=10&page=1 but if I request this page from python using urllib or urllib2 most of the times it doesn't work: response = urllib2.urlopen("http://search.twitter.com/search.atom?q=hello&rpp=10&page=1") and I get this error: Traceback (most recent call last): File "twitter.py", line 24, in <module> response = urllib2.urlopen("http:/

Opening a website frame or image in python

阅读更多关于 Opening a website frame or image in python

问题 So i am fairly fluent with python and have used urllib2 and Cookies a lot for website automation. I just stumbled upon the "webbrowser" module which can open a url in your default browser. Im wondering if its possible to select just one object from that url and open that up. Specifically i want to open a "captcha" so that the user can input it, and continue doing something else. this is line containing the captcha in the html, i think: script type="text/javascript" src="http://api.recaptcha

Urllib2- fetch and show any language page, encoding problem

阅读更多关于 Urllib2- fetch and show any language page, encoding problem

问题 I'm using Python Google App Engine to simply fetch html pages and show it. My aim is to be able to fetch any page in any language. Now I have a problem with encoding: Simple result = urllib2.urlopen(url).read() leaves artifacts in place of special letters and urllib2.urlopen(url).read().decode('utf8') throws error: 'utf8' codec can't decode bytes in position 3544-3546: invalid data So how to solve it? Is there any lib that would check what encoding page is and convert so it would be readable?

Python urllib2 or requests post method [duplicate]

阅读更多关于 Python urllib2 or requests post method [duplicate]

问题 This question already has answers here : Submitting to a web form using python (3 answers) Closed 3 years ago . I understand in general how to make a POST request using urllib2 (encoding the data, etc.), but the problem is all the tutorials online use completely useless made-up example urls to show how to do it ( someserver.com , coolsite.org , etc.), so I can't see the specific html that corresponds to the example code they use. Even python.org 's own tutorial is totally useless in this

python urllib2 can open localhost but not 127.0.0.1

阅读更多关于 python urllib2 can open localhost but not 127.0.0.1

问题 I am using python urllib2 library and can see a strange and nasty problem. Windows 7. My code: import urllib2 as url_request opener = url_request.build_opener(url_request.ProxyHandler({'http': 'http://login:password@server:8080'})) request = url_request.Request("http://localhost"); response = opener.open(request) print response.read() It works perfectly well, but when I change localhost to 127.0.0.1 this error happens: HTTPError: HTTP Error 502: Proxy Error ( Forefront TMG denied the

Python: Find a Sentence between some website-tags using regex

阅读更多关于 Python: Find a Sentence between some website-tags using regex

问题 I want to find a sentence between the ...class="question-hyperlink"> tags. With this code: import urllib2 import re response = urllib2.urlopen('https://stackoverflow.com/questions/tagged/python') html = response.read(20000) a = re.search('question-hyperlink', html) print html[a.end()+3:a.end()+100] I get: DF5 for Python: high level vs low level interfaces. h5py</a></h3> <div class="excerpt"> How can I stop at the next < ? And how do I find the next sentence? I want to do it with regex. EDIT