urllib

Python 3 - TypeError: a bytes-like object is required, not 'str'

北城以北 提交于 2019-11-30 14:11:50
I'm working on a lesson from Udacity and am having some issue trying to find out if the result from this site returns true or false. I get the TypeError with the code below. from urllib.request import urlopen #check text for curse words def check_profanity(): f = urlopen("http://www.wdylike.appspot.com/?q=shit") output = f.read() f.close() print(output) if "b'true'" in output: print("There is a profane word in the document") check_profanity() The output prints b'true' and I'm not really sure where that 'b' is coming from. In python 3 strings are by default unicode . The b in b'true' means that

BOM in server response screws up json parsing

末鹿安然 提交于 2019-11-30 14:03:14
问题 I'm trying to write a Python script that posts some JSON to a web server and gets some JSON back. I patched together a few different examples on StackOverflow, and I think I have something that's mostly working. import urllib2 import json url = "http://foo.com/API.svc/SomeMethod" payload = json.dumps( {'inputs': ['red', 'blue', 'green']} ) headers = {"Content-type": "application/json;"} req = urllib2.Request(url, payload, headers) f = urllib2.urlopen(req) response = f.read() f.close() data =

SSL: CERTIFICATE_VERIFY_FAILED with urllib

给你一囗甜甜゛ 提交于 2019-11-30 13:50:54
I'm running into trouble with the module urllib (Python 3.6). Every time I use the module, I get a page's worth of errors. what's wrong with urllib and how to fix it? import urllib.request url='https://www.goodreads.com/quotes/tag/artificial-intelligence' u1 = urllib.request.urlopen(url) print(u1) That block of code likes to spit out this mouthful of stuff: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1318, in do_open encode_chunked=req.has_header('Transfer-encoding')) File "/Library/Frameworks/Python

How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?

六眼飞鱼酱① 提交于 2019-11-30 13:47:11
问题 I’m playing around with the Stack Overflow API using Python. I’m trying to decode the gzipped responses that the API gives. import urllib, gzip url = urllib.urlopen('http://api.stackoverflow.com/1.0/badges/name') gzip.GzipFile(fileobj=url).read() According to the urllib2 documentation, urlopen “returns a file-like object”. However, when I run read() on the GzipFile object I’ve created using it, I get this error: AttributeError: addinfourl instance has no attribute 'tell' As far as I can tell,

socket ResourceWarning using urllib in Python 3

ε祈祈猫儿з 提交于 2019-11-30 13:40:54
I am using a urllib.request.urlopen() to GET from a web service I'm trying to test. This returns an HTTPResponse object, which I then read() to get the response body. But I always see a ResourceWarning about an unclosed socket from socket.py Here's the relevant function: from urllib.request import Request, urlopen def get_from_webservice(url): """ GET from the webservice """ req = Request(url, method="GET", headers=HEADERS) with urlopen(req) as rsp: body = rsp.read().decode('utf-8') return json.loads(body) Here's the warning as it appears in the program's output: $ ./test/test_webservices.py

How to download a file over http with authorization in python 3.0, working around bugs?

柔情痞子 提交于 2019-11-30 13:38:24
I have a script that I'd like to continue using, but it looks like I either have to find some workaround for a bug in Python 3, or downgrade back to 2.6, and thus having to downgrade other scripts as well... Hopefully someone here have already managed to find a workaround. The problem is that due to the new changes in Python 3.0 regarding bytes and strings, not all the library code is apparently tested. I have a script that downloades a page from a web server. This script passed a username and password as part of the url in python 2.6, but in Python 3.0, this doesn't work any more. For

Python爬虫一:抓取豆瓣电影Top250

只谈情不闲聊 提交于 2019-11-30 13:32:54
环境:Windows7 +Python3.6+Pycharm2017 目标:抓取豆瓣电影Top 250,保存电影封面到本地,保存电影的基本信息(片名、导演、主演、时间、评分、评价人数、引言)到txt文本。 ---全部文章: 京东爬虫 、 链家爬虫 、 美团爬虫 、 微信公众号爬虫 、 字体反爬 、 Django笔记 、 阿里云部署 、 vi\vim入门 ---- 豆瓣电影Top250应该是属于最容易抓取的静态网页类型,直接用python的urllib库发送请求,即可获得浏览器上看到的所有信息。不需要登录,也没有动态加载信息。 一、思路分析 用chrome打开豆瓣电影Top250页面, https://movie.douban.com/top250 。如下图第一部电影,肖申克的救赎,电影名称、导演、主演、年份、评分、评价人数这些信息是我们需要的。我们用浏览器或者python向浏览器发送请求的时候,返回的是html代码,我们平时用浏览器浏览网页看到的这些图文并茂的规整的页面其实是html代码在经过浏览器渲染后的结果。所以我们需要找到要抓取信息在html代码中的位置。这就叫html解析,解析的工具有很多。比如:正则表达式、Beautifulsoup、Xpath、css等,这里采用xpath方法。 如何找到信息在html中的位置呢,首先鼠标右键检查,打开当前网页的html代码

urllib2.urlopen() vs urllib.urlopen() - urllib2 throws 404 while urllib works! WHY?

不想你离开。 提交于 2019-11-30 13:01:29
问题 import urllib print urllib.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read() The above script works and returns the expected results while: import urllib2 print urllib2.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read() throws the following error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File

Turning on debug output for python 3 urllib

拟墨画扇 提交于 2019-11-30 11:19:32
In python 2, it was possible to get debug output from urllib by doing import httplib import urllib httplib.HTTPConnection.debuglevel = 1 response = urllib.urlopen('http://example.com').read() However, in python 3 it looks like this has been moved to http.client.HTTPConnection.set_debuglevel(level) However, I'm using urllib not http.client directly. How can I set it up so that my http request display debugging information in this way? Here's what I"m using so far. What's the best way to proceed if I want to be able to get debug information? #Request Login page cookiejar = http.cookiejar

python中 urllib, urllib2, httplib, httplib2 几个库的区别

非 Y 不嫁゛ 提交于 2019-11-30 09:56:43
若只使用python3.X, 下面可以不看了, 记住有个urllib的库就行了 python2.X 有这些库名可用: urllib , urllib2 , urllib3, httplib , httplib2, requests python3.X 有这些库名可用: urllib, urllib3, httplib2, requests 两者都有的urllib3和requests, 它们不是标准库. urllib3 提供线程安全连接池和文件post支持,与urllib及urllib2的关系不大. requests 自称HTTP for Humans, 使用更简洁方便 对于python2.X: urllib和urllib2的主要区别: urllib2可以接受Request对象为URL设置头信息,修改用户代理,设置cookie等, urllib只能接受一个普通的URL. urllib提供一些比较原始基础的方法而urllib2没有这些, 比如 urlencode urllib官方文档的几个例子 使用带参数的GET方法取回URL >>> import urllib >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) >>> f = urllib.urlopen("http://www.musi-cal.com