urllib2

python urllib2 utf-8 encoding

℡╲_俬逩灬. 提交于 2019-12-05 06:29:05
问题 okay, I have: # -*- coding: utf-8 -*- in my python file. the snippet: opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] opener.addheaders = [('Accept-Charset', 'utf-8')] f =opener.open(url) doc = f.read().decode('utf-8') The server response is: (via f.info()) Content-Type: text/html; charset=UTF-8 but i get the error: UnicodeDecodeError: 'utf8' codec can't decode byte[...]: invalid continuation byte What's wrong here? 回答1: Try decoding the data using 'latin-1

example urllib3 and threading in python

ⅰ亾dé卋堺 提交于 2019-12-05 06:01:47
问题 I am trying to use urllib3 in simple thread to fetch several wiki pages. The script will Create 1 connection for every thread (I don't understand why) and Hang forever. Any tip, advice or simple example of urllib3 and threading import threadpool from urllib3 import connection_from_url HTTP_POOL = connection_from_url(url, timeout=10.0, maxsize=10, block=True) def fetch(url, fiedls): kwargs={'retries':6} return HTTP_POOL.get_url(url, fields, **kwargs) pool = threadpool.ThreadPool(5) requests =

python urllib2 file send problem

落爺英雄遲暮 提交于 2019-12-05 02:58:07
问题 I want to post a file to a server via python, for this I need to name this file as "xmlfile" so that server recognizes the input. import urllib2 url = "http://somedomain" to_send = open('test.xml').read() data = {} data['xmlfile'] = to_send f = urllib2.urlopen(url, data) This doesn't work, in addition, how can I retrieve the response and save someplace ? In other words, I want to do the action as I do with Curl: curl.exe http://somedomain -F xmlfile=@test.xml -o response.html 回答1: I just read

Urllib's urlopen breaking on some sites (e.g. StackApps api): returns garbage results

好久不见. 提交于 2019-12-05 02:31:08
I'm using urllib2 's urlopen function to try and get a JSON result from the StackOverflow api. The code I'm using: >>> import urllib2 >>> conn = urllib2.urlopen("http://api.stackoverflow.com/0.8/users/") >>> conn.readline() The result I'm getting: '\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\xed\xbd\x07`\x1cI\x96%&/m\xca{\x7fJ\... I'm fairly new to urllib, but this doesn't seem like the result I should be getting. I've tried it in other places and I get what I expect (the same as visiting the address with a browser gives me: a JSON object). Using urlopen on other sites (e.g. " http://google.com "

Python: Processing Javascript with urllib2?

吃可爱长大的小学妹 提交于 2019-12-04 23:55:48
问题 I am coding a HTML scraper which gets values from a table on a website. I also need to grab the URL of an image, but the problem is this image is dynamically generated via javascript - and when i get contents of the website via urllib, the Javascript does not run or show in the resulting HTML. Is there any way to enable Javascript to run on pages which are accessed via urllib? 回答1: No, you'd need some sort of JS interpreter for that. There might be Python-Browser integrations to help parsing

python Requests库总结

亡梦爱人 提交于 2019-12-04 23:26:29
什么是Requests库? requests库github地址:https://github.com/requests/requests Reqyests库主要用来准备Request和处理Response。 为什么要学习Requests库? web开发和爬虫都需要学习的东西,在服务端编程中理解好Requests库可以更好的编写Restful API的程序,还是自动化测试的工具箱。 安装Requests库 pip install requests 这个是安装requests库的 1 pip install gunicorn gunicorn是一个 python Wsgi http server ,只支持在Unix系统上运行,来源于Ruby的unicorn项目。 pip install httpbin httpbin是一个http库的测试工具 gunicorn httpbin:app 通过gunicorn启动httpbin,可以通过127.0.0.1/8000访问 简单了解http协议 http协议:HyperText Transfer Protocl 超文本传输协议. http协议是 应用层 上的一个 无状态 的协议,是一种为分布式,协作式,多媒体信息服务的协议。 1 2 3 4 > GET / HTTP/1.1 > Host: www.imooc.com > User-Agent:

Getting TTFB (time till first byte) for an HTTP Request

青春壹個敷衍的年華 提交于 2019-12-04 23:23:23
问题 Here is a python script that loads a url and captures response time: import urllib2 import time opener = urllib2.build_opener() request = urllib2.Request('http://example.com') start = time.time() resp = opener.open(request) resp.read() ttlb = time.time() - start Since my timer is wrapped around the whole request/response (including read()), this will give me the TTLB (time to last byte). I would also like to get the TTFB (time to first byte), but am not sure where to start/stop my timing. Is

爬虫urllib2 的异常错误处理URLError和HTTPError

元气小坏坏 提交于 2019-12-04 21:22:47
urllib2 的异常错误处理 在我们用 urlopen或opener.open 方法发出一个请求时,如果 urlopen或opener.open 不能处理这个response,就产生错误。 这里主要说的是URLError和HTTPError,以及对它们的错误处理。 URLError URLError 产生的原因主要有: 没有网络连接 服务器连接失败 找不到指定的服务器 我们可以用 try except 语句来捕获相应的异常。下面的例子里我们访问了一个不存在的域名: # urllib2_urlerror.py import urllib2 requset = urllib2.Request('http://www.ajkfhafwjqh.com') try: urllib2.urlopen(request, timeout=5) except urllib2.URLError, err: print err 运行结果如下: <urlopen error [Errno 8] nodename nor servname provided, or not known> urlopen error,错误代码8,错误原因是没有找到指定的服务器。 HTTPError HTTPError是URLError的子类,我们发出一个请求时,服务器上都会对应一个response应答对象,其中它包含一个数字

python urllib2

淺唱寂寞╮ 提交于 2019-12-04 20:58:08
import urllib2 import re res=urllib2.urlopen("http://www.nipic.com/") #print res.read() all=re.findall(r'http://icon.nipic.com/BannerPic.+\.jpg',res.read()) print all num = 0 for url in all: picture=urllib2.urlopen(url) buf=picture.read() local=open(str(num) + '.jpg','wb') local.write(buf) local.close() num+=1 抓取网页图片保存本地 ,注意 open文件方式为 wb ,二进制形式打开文件。 来源: https://www.cnblogs.com/jkklearn/p/11883384.html

python urllib2 download size

假如想象 提交于 2019-12-04 20:33:15
iwant to download a file with the urllib2, and meanwhile i want to display a progress bar.. but how can i get the actual downloaded filesize? my current code is ul = urllib2.urlopen('www.file.com/blafoo.iso') data = ul.get_data() or open('file.iso', 'w').write(ul.read()) The data is first written to the file, if the whole download is recieved from the website. how can i access the downloaded data size? Thanks for your help Here's an example of a text progress bar using the awesome requests library and the progressbar library: import requests import progressbar ISO = "http://www.ubuntu.com