urllib

BeautifulSoup get_text from find_all

孤街浪徒 提交于 2019-11-30 08:50:32
问题 This is my first work with web scraping. So far I am able to navigate and find the part of the HTML I want. I can print it as well. The problem is printing only the text, which will not work. I get following error, when trying it: AttributeError: 'ResultSet' object has no attribute 'get_text' Here my code: from bs4 import BeautifulSoup import urllib page = urllib.urlopen('some url') soup = BeautifulSoup(page) zeug = soup.find_all('div', attrs={'class': 'fm_linkeSpalte'}).get_text() print zeug

Python urllib.request.urlopen() returning error 10061?

放肆的年华 提交于 2019-11-30 08:44:15
问题 I'm trying to download the HTML of a page ( http://www.google.com in this case) but I'm getting back an error. Here is my interactive prompt session: Python 3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib >>> import urllib.request >>> html = urllib.request.urlopen("http://www.google.com") Traceback (most recent call last): File "\\****.****.org\myhome\python\lib\urllib\request

Timeout a file download with Python urllib?

被刻印的时光 ゝ 提交于 2019-11-30 08:30:53
问题 Python beginner here. I want to be able to timeout my download of a video file if the process takes longer than 500 seconds. import urllib try: urllib.urlretrieve ("http://www.videoURL.mp4", "filename.mp4") except Exception as e: print("error") How do I amend my code to make that happen? 回答1: Better way is to use requests so you can stream the results and easily check for timeouts: import requests # Make the actual request, set the timeout for no data to 10 seconds and enable streaming

How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?

别来无恙 提交于 2019-11-30 08:28:29
I’m playing around with the Stack Overflow API using Python. I’m trying to decode the gzipped responses that the API gives. import urllib, gzip url = urllib.urlopen('http://api.stackoverflow.com/1.0/badges/name') gzip.GzipFile(fileobj=url).read() According to the urllib2 documentation , urlopen “returns a file-like object”. However, when I run read() on the GzipFile object I’ve created using it, I get this error: AttributeError: addinfourl instance has no attribute 'tell' As far as I can tell, this is coming from the object returned by urlopen . It doesn’t appear to have seek either, as I get

Response time for urllib in python

牧云@^-^@ 提交于 2019-11-30 07:32:19
I want to get response time when I use urllib . I made below code, but it is more than response time. Can I get the time using urllib or have any other method? import urllib import datetime def main(): urllist = [ "http://google.com", ] for url in urllist: opener = urllib.FancyURLopener({}) try: start = datetime.datetime.now() f = opener.open(url) end = datetime.datetime.now() diff = end - start print int(round(diff.microseconds / 1000)) except IOError, e: print 'error', url else: print f.getcode(), f.geturl() if __name__ == "__main__": main() Save yourself some hassle and use the requests

Python urllib urlencode problem with æøå

元气小坏坏 提交于 2019-11-30 07:31:54
How can I urlencode a string with special chars æøå? ex. urllib.urlencode('http://www.test.com/q=testæøå') I get this error :(.. not a valid non-string sequence or mapping object You should pass dictionary to urlencode, not a string. See the correct example below: from urllib import urlencode print 'http://www.test.com/?' + urlencode({'q': 'testæøå'}) urlencode is intended to take a dictionary, for example: >>> q= u'\xe6\xf8\xe5' # u'æøå' >>> params= {'q': q.encode('utf-8')} >>> 'http://www.test.com/?'+urllib.urlencode(params) 'http://www.test.com/?q=%C3%A6%C3%B8%C3%A5' If you just want to URL

urllib.parse:很底层,但是是一个处理url路径的好模块

与世无争的帅哥 提交于 2019-11-30 07:22:01
介绍 urllib.parse是为urllib包下面的一个模块,urllib的其它模块完全可以使用requests替代。但是urlli.parse我们是有必要了解的,因为该模块下面有很多操作url路径的方法 urlparse:拆分url from urllib import parse url = "https://www.baidu.com/s?wd=python" print(parse.urlparse(url)) # ParseResult(scheme='https', netloc='www.baidu.com', path='/s', params='', query='wd=python', fragment='') """ scheme:协议,比如http,https等等。 netloc:域名,这里是www.baidu.com path:路径,跟在域名后面 params:参数 query:查询条件 fragment:锚点,用于直接定位页面的下拉位置,跳转到网页的指定位置 """ scheme, netloc, path, params, query, fragment = parse.urlparse(url) print(f"协议:{scheme}") print(f"域名:{netloc}") print(f"路径:{path}") print(f"参数:

urllib2.urlopen() vs urllib.urlopen() - urllib2 throws 404 while urllib works! WHY?

我的梦境 提交于 2019-11-30 04:53:58
import urllib print urllib.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read() The above script works and returns the expected results while: import urllib2 print urllib2.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read() throws the following error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.5/urllib2.py", line 124, in urlopen return _opener.open(url, data) File "/usr/lib

How do I fix a ValueError: read of closed file exception?

爷,独闯天下 提交于 2019-11-30 04:29:16
问题 This simple Python 3 script: import urllib.request host = "scholar.google.com" link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0" url = "http://" + host + link filename = "cite0.bib" print(url) urllib.request.urlretrieve(url, filename) raises this exception: Traceback (most recent call last): File "C:\Users\ricardo\Desktop\Google-Scholar\BibTex\test2.py", line 8, in <module> urllib.request.urlretrieve(url, filename) File "C:

Difference between Python urllib.urlretrieve() and wget

白昼怎懂夜的黑 提交于 2019-11-30 03:51:15
I am trying to retrieve a 500mb file using Python, and I have a script which uses urllib.urlretrieve() . There seems to some network problem between me and the download site, as this call consistently hangs and fails to complete. However, using wget to retrieve the file tends to work without problems. What is the difference between urlretrieve() and wget that could cause this difference? Peter Lyons The answer is quite simple. Python's urllib and urllib2 are nowhere near as mature and robust as they could be. Even better than wget in my experience is cURL . I've written code that downloads