urllib2

SSLv3 alert handshake failure with urllib2

旧街凉风 提交于 2019-11-27 07:02:03
问题 I'm having troubles connecting with https using urllib2 under Python 2.7.10. Any thoughts what I'm missing? Python 2.7.10 (default, Jun 18 2015, 10:53:24) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import ssl, urllib2 >>> ssl.HAS_SNI True >>> ssl.OPENSSL_VERSION 'OpenSSL 0.9.8o 01 Jun 2010' >>> opener = urllib2.build_opener() >>> opener.open('https://twitrss.me/') Traceback (most recent call last): File "<stdin>", line 1, in <module> File

urllib2 file name

两盒软妹~` 提交于 2019-11-27 06:58:39
If I open a file using urllib2, like so: remotefile = urllib2.urlopen('http://example.com/somefile.zip') Is there an easy way to get the file name other then parsing the original URL? EDIT: changed openfile to urlopen... not sure how that happened. EDIT2: I ended up using: filename = url.split('/')[-1].split('#')[0].split('?')[0] Unless I'm mistaken, this should strip out all potential queries as well. Did you mean urllib2.urlopen ? You could potentially lift the intended filename if the server was sending a Content-Disposition header by checking remotefile.info()['Content-Disposition'] , but

How do I send a custom header with urllib2 in a HTTP Request?

孤街浪徒 提交于 2019-11-27 06:52:03
I want to send a custom "Accept" header in my request when using urllib2.urlopen(..). How do I do that? Not quite. Creating a Request object does not actually send the request, and Request objects have no Read() method. (Also: read() is lowercase.) All you need to do is pass the Request as the first argument to urlopen() and that will give you your response. import urllib2 request = urllib2.Request("http://www.google.com", headers={"Accept" : "text/html"}) contents = urllib2.urlopen(request).read() I normally use: import urllib2 request_headers = { "Accept-Language": "en-US,en;q=0.5", "User

Python3 error: initial_value must be str or None

孤街浪徒 提交于 2019-11-27 06:44:13
While porting code from python2 to 3 , I get this error when reading from a URL TypeError: initial_value must be str or None, not bytes. import urllib import json import gzip from urllib.parse import urlencode from urllib.request import Request service_url = 'https://babelfy.io/v1/disambiguate' text = 'BabelNet is both a multilingual encyclopedic dictionary and a semantic network' lang = 'EN' Key = 'KEY' params = { 'text' : text, 'key' : Key, 'lang' :'EN' } url = service_url + '?' + urllib.urlencode(params) request = Request(url) request.add_header('Accept-encoding', 'gzip') response = urllib

How to fix ImportError: No module named packages.urllib3?

旧时模样 提交于 2019-11-27 06:39:23
问题 I'm running Python 2.7.6 on an Ubuntu machine. When I run twill-sh (Twill is a browser used for testing websites) in my Terminal, I'm getting the following: Traceback (most recent call last): File "dep.py", line 2, in <module> import twill.commands File "/usr/local/lib/python2.7/dist-packages/twill/__init__.py", line 52, in <module> from shell import TwillCommandLoop File "/usr/local/lib/python2.7/dist-packages/twill/shell.py", line 9, in <module> from twill import commands, parse, __version_

How to download any(!) webpage with correct charset in python?

ε祈祈猫儿з 提交于 2019-11-27 06:19:29
Problem When screen-scraping a webpage using python one has to know the character encoding of the page. If you get the character encoding wrong than your output will be messed up. People usually use some rudimentary technique to detect the encoding. They either use the charset from the header or the charset defined in the meta tag or they use an encoding detector (which does not care about meta tags or headers). By using only one these techniques, sometimes you will not get the same result as you would in a browser. Browsers do it this way: Meta tags always takes precedence (or xml definition)

how to follow meta refreshes in Python

放肆的年华 提交于 2019-11-27 05:18:18
Python's urllib2 follows 3xx redirects to get the final content. Is there a way to make urllib2 (or some other library such as httplib2 ) also follow meta refreshes ? Or do I need to parse the HTML manually for the refresh meta tags? asmaier Here is a solution using BeautifulSoup and httplib2 (and certificate based authentication): import BeautifulSoup import httplib2 def meta_redirect(content): soup = BeautifulSoup.BeautifulSoup(content) result=soup.find("meta",attrs={"http-equiv":"Refresh"}) if result: wait,text=result["content"].split(";") if text.strip().lower().startswith("url="): url

opening websites using urllib2 from behind corporate firewall - 11004 getaddrinfo failed

白昼怎懂夜的黑 提交于 2019-11-27 05:12:14
I am trying to access a website from behind corporate firewall using below:- password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, url, username, password) auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr) opener = urllib2.build_opener(auth_handler) urllib2.install_opener(opener) conn = urllib2.urlopen('http://python.org') Getting error URLError: <urlopen error [Errno 11004] getaddrinfo failed> I have tried with different handlers (tried ProxyHandler also in slightly different way), but doesn't seem to work. Any clues to what could be the reason for

urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>

淺唱寂寞╮ 提交于 2019-11-27 04:38:48
If I run: urllib2.urlopen('http://google.com') even if I use another url, I get the same error. I'm pretty sure there is no firewall running on my computer or router, and the internet (from a browser) works fine. The problem, in my case, was that some install at some point defined an environment variable http_proxy on my machine when I had no proxy. Removing the http_proxy environment variable fixed the problem. The site's DNS record is such that Python fails the DNS lookup in a peculiar way: it finds the entry, but zero associated IP addresses. (Verify with nslookup.) Hence, 11004, WSANO_DATA

Requests, bind to an ip

帅比萌擦擦* 提交于 2019-11-27 04:17:12
I have a script that makes some requests with urllib2 . I use the trick suggested elsewhere on Stack Overflow to bind another ip to the application, where my my computer has two ip addresses (IP A and IP B). I would like to switch to using the requests library . Does anyone knows how I can achieve the same functionality with that library? Looking into the requests module, it looks like it uses httplib to send the http requests. httplib uses socket.create_connection() to connect to the www host. Knowing that and following the monkey patching method in the link you provided: import socket real