urllib2 | 易学教程

SSLv3 alert handshake failure with urllib2

阅读更多关于 SSLv3 alert handshake failure with urllib2

问题 I'm having troubles connecting with https using urllib2 under Python 2.7.10. Any thoughts what I'm missing? Python 2.7.10 (default, Jun 18 2015, 10:53:24) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import ssl, urllib2 >>> ssl.HAS_SNI True >>> ssl.OPENSSL_VERSION 'OpenSSL 0.9.8o 01 Jun 2010' >>> opener = urllib2.build_opener() >>> opener.open('https://twitrss.me/') Traceback (most recent call last): File "<stdin>", line 1, in <module> File

urllib2 file name

阅读更多关于 urllib2 file name

If I open a file using urllib2, like so: remotefile = urllib2.urlopen('http://example.com/somefile.zip') Is there an easy way to get the file name other then parsing the original URL? EDIT: changed openfile to urlopen... not sure how that happened. EDIT2: I ended up using: filename = url.split('/')[-1].split('#')[0].split('?')[0] Unless I'm mistaken, this should strip out all potential queries as well. Did you mean urllib2.urlopen ? You could potentially lift the intended filename if the server was sending a Content-Disposition header by checking remotefile.info()['Content-Disposition'] , but

How do I send a custom header with urllib2 in a HTTP Request?

阅读更多关于 How do I send a custom header with urllib2 in a HTTP Request?

I want to send a custom "Accept" header in my request when using urllib2.urlopen(..). How do I do that? Not quite. Creating a Request object does not actually send the request, and Request objects have no Read() method. (Also: read() is lowercase.) All you need to do is pass the Request as the first argument to urlopen() and that will give you your response. import urllib2 request = urllib2.Request("http://www.google.com", headers={"Accept" : "text/html"}) contents = urllib2.urlopen(request).read() I normally use: import urllib2 request_headers = { "Accept-Language": "en-US,en;q=0.5", "User

Python3 error: initial_value must be str or None

阅读更多关于 Python3 error: initial_value must be str or None

While porting code from python2 to 3 , I get this error when reading from a URL TypeError: initial_value must be str or None, not bytes. import urllib import json import gzip from urllib.parse import urlencode from urllib.request import Request service_url = 'https://babelfy.io/v1/disambiguate' text = 'BabelNet is both a multilingual encyclopedic dictionary and a semantic network' lang = 'EN' Key = 'KEY' params = { 'text' : text, 'key' : Key, 'lang' :'EN' } url = service_url + '?' + urllib.urlencode(params) request = Request(url) request.add_header('Accept-encoding', 'gzip') response = urllib

How to fix ImportError: No module named packages.urllib3?

阅读更多关于 How to fix ImportError: No module named packages.urllib3?

问题 I'm running Python 2.7.6 on an Ubuntu machine. When I run twill-sh (Twill is a browser used for testing websites) in my Terminal, I'm getting the following: Traceback (most recent call last): File "dep.py", line 2, in <module> import twill.commands File "/usr/local/lib/python2.7/dist-packages/twill/__init__.py", line 52, in <module> from shell import TwillCommandLoop File "/usr/local/lib/python2.7/dist-packages/twill/shell.py", line 9, in <module> from twill import commands, parse, __version_

How to download any(!) webpage with correct charset in python?

阅读更多关于 How to download any(!) webpage with correct charset in python?

Problem When screen-scraping a webpage using python one has to know the character encoding of the page. If you get the character encoding wrong than your output will be messed up. People usually use some rudimentary technique to detect the encoding. They either use the charset from the header or the charset defined in the meta tag or they use an encoding detector (which does not care about meta tags or headers). By using only one these techniques, sometimes you will not get the same result as you would in a browser. Browsers do it this way: Meta tags always takes precedence (or xml definition)

how to follow meta refreshes in Python

阅读更多关于 how to follow meta refreshes in Python

Python's urllib2 follows 3xx redirects to get the final content. Is there a way to make urllib2 (or some other library such as httplib2 ) also follow meta refreshes ? Or do I need to parse the HTML manually for the refresh meta tags? asmaier Here is a solution using BeautifulSoup and httplib2 (and certificate based authentication): import BeautifulSoup import httplib2 def meta_redirect(content): soup = BeautifulSoup.BeautifulSoup(content) result=soup.find("meta",attrs={"http-equiv":"Refresh"}) if result: wait,text=result["content"].split(";") if text.strip().lower().startswith("url="): url

opening websites using urllib2 from behind corporate firewall - 11004 getaddrinfo failed

阅读更多关于 opening websites using urllib2 from behind corporate firewall - 11004 getaddrinfo failed

I am trying to access a website from behind corporate firewall using below:- password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, url, username, password) auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr) opener = urllib2.build_opener(auth_handler) urllib2.install_opener(opener) conn = urllib2.urlopen('http://python.org') Getting error URLError: <urlopen error [Errno 11004] getaddrinfo failed> I have tried with different handlers (tried ProxyHandler also in slightly different way), but doesn't seem to work. Any clues to what could be the reason for

urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>

阅读更多关于 urllib2.URLError:

If I run: urllib2.urlopen('http://google.com') even if I use another url, I get the same error. I'm pretty sure there is no firewall running on my computer or router, and the internet (from a browser) works fine. The problem, in my case, was that some install at some point defined an environment variable http_proxy on my machine when I had no proxy. Removing the http_proxy environment variable fixed the problem. The site's DNS record is such that Python fails the DNS lookup in a peculiar way: it finds the entry, but zero associated IP addresses. (Verify with nslookup.) Hence, 11004, WSANO_DATA

Requests, bind to an ip

阅读更多关于 Requests, bind to an ip

I have a script that makes some requests with urllib2 . I use the trick suggested elsewhere on Stack Overflow to bind another ip to the application, where my my computer has two ip addresses (IP A and IP B). I would like to switch to using the requests library . Does anyone knows how I can achieve the same functionality with that library? Looking into the requests module, it looks like it uses httplib to send the http requests. httplib uses socket.create_connection() to connect to the www host. Knowing that and following the monkey patching method in the link you provided: import socket real