urlopen

BeautifulSoup HTTPResponse has no attribute encode

守給你的承諾、 提交于 2020-02-03 11:00:27
问题 I'm trying to get beautifulsoup working with a URL, like the following: from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://proxies.org") soup = BeautifulSoup(html.encode("utf-8"), "html.parser") print(soup.find_all('a')) However, I am getting a error: File "c:\Python3\ProxyList.py", line 3, in <module> html = urlopen("http://proxies.org").encode("utf-8") AttributeError: 'HTTPResponse' object has no attribute 'encode' Any idea why? Could it be to do with

How to set TCP_NODELAY flag when loading URL with urllib2?

半腔热情 提交于 2020-01-30 04:09:37
问题 I am using urllib2 for loading web-page, my code is: httpRequest = urllib2.Request("http:/www....com") pageContent = urllib2.urlopen(httpRequest) pageContent.readline() How can I get hold of the socket properties to set TCP_NODELAY ? In normal socket I would be using function: socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) 回答1: If you need to access to such low level property on the socket used, you'll have to overload some objects. First, you'll need to create a subclass of

urlopen trouble while trying to download a gzip file

ⅰ亾dé卋堺 提交于 2020-01-23 19:39:47
问题 I am going to use the wiktionary dump for the purpose of POS tagging. Somehow it gets stuck when downloading. Here is my code: import nltk from urllib import urlopen from collections import Counter import gzip url = 'http://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-all-titles-in-ns0.gz' fStream = gzip.open(urlopen(url).read(), 'rb') dictFile = fStream.read() fStream.close() text = nltk.Text(word.lower() for word in dictFile()) tokens = nltk.word_tokenize(text) Here is the

Urllib's urlopen breaking on some sites (e.g. StackApps api): returns garbage results

只愿长相守 提交于 2020-01-02 01:46:32
问题 I'm using urllib2 's urlopen function to try and get a JSON result from the StackOverflow api. The code I'm using: >>> import urllib2 >>> conn = urllib2.urlopen("http://api.stackoverflow.com/0.8/users/") >>> conn.readline() The result I'm getting: '\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\xed\xbd\x07`\x1cI\x96%&/m\xca{\x7fJ\... I'm fairly new to urllib, but this doesn't seem like the result I should be getting. I've tried it in other places and I get what I expect (the same as visiting the

http.client.RemoteDisconnected error while reading/parsing a list of URL's

谁都会走 提交于 2019-12-25 07:46:04
问题 I am working on a simple url parser: the idea is to take a url in one column, attempt to resolve it and print out the output of where it redirects to. I have the basic functionality working, however every so often it throws a http.client.RemoteDisconnected exception and the program stops: throwing a few errrors (below). Traceback (most recent call last): File "URLIFIER.py", line 43, in <module> row.append(urlparse(row[0])) File "URLIFIER.py", line 12, in urlparse conn = urllib.request.urlopen

Python 3.4 using urlopen to get web content via proxy

青春壹個敷衍的年華 提交于 2019-12-24 16:39:05
问题 Using the example below i am trying to get web conents from behind a proxy server but so far am unsuccessful. proxies = {'http': 'http://proxy:8080'} from urllib.request import urlopen with urlopen('http://sixty-north.com/c/t.txt', proxies) as story: story_words = [] for line in story: line_words = line.split() for word in line_words: story_words.append(word) story_words Any idea where I am going wrong? If I remove the proxies argument I get: [WinError 10060] A connection attempt failed

Python: Urllib.urlopen nonnumeric port

∥☆過路亽.° 提交于 2019-12-23 09:05:34
问题 for the following code theurl = "https://%s:%s@members.dyndns.org/nic/update?hostname=%s&myip=%s&wildcard=NOCHG&mx=NOCHG&backmx=NOCHG" % (username, password, hostname, theip) conn = urlopen(theurl) # send the request to the url print(conn.read()) # read the response conn.close() # close the connection i get the following error File "c:\Python31\lib\http\client.py", line 667, in _set_hostport raise InvalidURL("nonnumeric port: '%s'" % host[i+1:]) Any Ideas??? 回答1: You probably need to url

Python 3.5.1 urllib has no attribute request

落花浮王杯 提交于 2019-12-23 06:56:38
问题 I have tried import urllib.request or import urllib The path for my urllib is /Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/__init__.py I am wondering where is urlopen, or is my python module pointing to the wrong file? 回答1: Use this: import urllib.request The reason is: With packages, like this, you sometimes need to explicitly import the piece you want. That way, the urllib module doesn't have to load everything up just because you wanted one small part. According

Python 3.5.1 urllib has no attribute request

。_饼干妹妹 提交于 2019-12-23 06:56:06
问题 I have tried import urllib.request or import urllib The path for my urllib is /Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/__init__.py I am wondering where is urlopen, or is my python module pointing to the wrong file? 回答1: Use this: import urllib.request The reason is: With packages, like this, you sometimes need to explicitly import the piece you want. That way, the urllib module doesn't have to load everything up just because you wanted one small part. According

unbuffered urllib2.urlopen

烂漫一生 提交于 2019-12-23 02:32:32
问题 I have client for web interface to long running process. I'd like to have output from that process to be displayed as it comes. Works great with urllib.urlopen() , but it doesn't have timeout parameter. On the other hand with urllib2.urlopen() the output is buffered. Is there a easy way to disable that buffer? 回答1: A quick hack that has occurred to me is to use urllib.urlopen() with threading.Timer() to emulate timeout. But that's only quick and dirty hack. 回答2: urllib2 is buffered when you