urllib

urllib库

眉间皱痕 提交于 2019-11-30 18:10:28
urllib库 urllib 库是 Python 中一个最基本的网络请求库。可以模拟浏览器的行为,向指定的服务器发送一个请求,并可以保存服务器返回的数据。 urlopen函数: 在 Python3 的 urllib 库中,所有和网络请求相关的方法,都被集到 urllib.request 模块下面了,以先来看下 urlopen 函数基本的使用: from urllib import request resp = request.urlopen('http://www.baidu.com') print(resp.read()) 实际上,使用浏览器访问百度,右键查看源代码。你会发现,跟我们刚才打印出来的数据是一模一样的。也就是说,上面的三行代码就已经帮我们把百度的首页的全部代码爬下来了。一个基本的url请求对应的python代码真的非常简单。 以下对 urlopen 函数的进行详细讲解: url :请求的url。 data :请求的 data ,如果设置了这个值,那么将变成 post 请求。 返回值:返回值是一个 http.client.HTTPResponse 对象,这个对象是一个类文件句柄对象。有 read(size) 、 readline 、 readlines 以及 getcode 等方法。 urlretrieve函数: 这个函数可以方便的将网页上的一个文件保存到本地

Trying to post multipart form data in python, won't post

 ̄綄美尐妖づ 提交于 2019-11-30 17:57:50
问题 I'm fairly new to python, so I apologize in advance if this is something simple I'm missing. I'm trying to post data to a multipart form in python. The script runs, but it won't post. I'm not sure what I'm doing wrong. import urllib, urllib2 from poster.encode import multipart_encode from poster.streaminghttp import register_openers def toqueXF(): register_openers() url = "http://localhost/trunk/admin/new.php" values = {'form':open('/test.pdf'), 'bandingxml':open('/banding.xml'), 'desc':

python3 urllib 访问https网站

瘦欲@ 提交于 2019-11-30 17:24:48
当使用urllib模块访问https网站时,由于需要提交表单,而python3默认是不提交表单的,所以这时只需在代码中加上以下代码即可。 import ssl ssl._create_default_https_context = ssl._create_unverified_context 来源: CSDN 作者: swaggy_python 链接: https://blog.csdn.net/wangkaidehao/article/details/78669653

Javascript unescape() vs. Python urllib.unquote()

為{幸葍}努か 提交于 2019-11-30 17:16:14
问题 From reading various posts, it seems like JavaScript's unescape() is equivalent to Pythons urllib.unquote() , however when I test both I get different results: In browser console: unescape('%u003c%u0062%u0072%u003e'); output: <br> In Python interpreter: import urllib urllib.unquote('%u003c%u0062%u0072%u003e') output: %u003c%u0062%u0072%u003e I would expect Python to also return <br> . Any ideas as to what I'm missing here? Thanks! 回答1: %uxxxx is a non standard URL encoding scheme that is not

Preventing a “hidden” redirect with urlopen() in Python

点点圈 提交于 2019-11-30 16:01:01
问题 I am using BeautifulSoup for web scraping and I am having problems with a particular type of website when using urlopen . Every item on the website has its own unique page and the item comes in different formats ( ex: 500 mL, 1L, 2L,... ). When I open the URL of the product ( www.example.com/product1 ) using my Internet Browser, I would see a picture of the 500 mL format, information about it ( price, quantity, flavor, etc. ) and a list of all the other formats available for this specific

Download a file to a specific folder with python

喜欢而已 提交于 2019-11-30 15:42:38
问题 I am trying to download a particular file to a specific folder on my hardisk. I am using IronPython 2.7 and urllib module. I tried downloading the file with the following code: import urllib response = urllib.urlretrieve(someURL, 'C:/someFolder') html = response.read() response.close() But when upper code is ran, I get the following error message: Runtime error (IOException): Access to the path 'D:\someFolder' is denied. Traceback: line 91, in urlretrieve, "C:\Program Files\IronPython\Lib

python爬有道翻译

旧城冷巷雨未停 提交于 2019-11-30 15:13:11
在有道翻译页面中打开开发者工具,在Headers板块找到Request URL以及相应的data。 import urllib.request import urllib.parse import json content=input('请输入需要翻译的内容:') #_o要去掉,否则会出先error_code:50的报错 url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule' data={} #开发者工具里有,i和doctype键不可少 data['i']=content data['from']='AUTO' data['to']='AUTO' data['smartresult']='dict' data['client']='fanyideskweb' data['salt']='15695569180611' data['sign']='5b0565493d812bc5e713b895c12d615d' data['doctype']='json' data['version']='2.1' data['keyfrom']='fanyi.web' data['action']='FY_BY_REALTTIME' #将字典类型的请求数据转化为url编码,并将编码类型转变为'utf-8

python urllib的使用

大城市里の小女人 提交于 2019-11-30 15:04:45
1.爬取百度首页面所有数据值 #!/usr/bin/env python # -*- coding:utf-8 -*- #导包 import urllib.request import urllib.parse if __name__ == "__main__": #指定爬取的网页url url = 'http://www.baidu.com/' #通过urlopen函数向指定的url发起请求,返回响应对象 reponse = urllib.request.urlopen(url=url) #通过调用响应对象中的read函数,返回响应回客户端的数据值(爬取到的数据) data = reponse.read()#返回的数据为byte类型,并非字符串 print(data)#打印显示爬取到的数据值。 补充说明: urlopen函数原型:urllib.request.urlopen(url, data=None, timeout=<object object at 0x10af327d0>, *, cafile=None, capath=None, cadefault=False, context=None) 在上述案例中我们只使用了该函数中的第一个参数url。在日常开发中,我们能用的只有url和data这两个参数。 url参数:指定向哪个url发起请求 data参数

Turning on debug output for python 3 urllib

馋奶兔 提交于 2019-11-30 14:41:51
问题 In python 2, it was possible to get debug output from urllib by doing import httplib import urllib httplib.HTTPConnection.debuglevel = 1 response = urllib.urlopen('http://example.com').read() However, in python 3 it looks like this has been moved to http.client.HTTPConnection.set_debuglevel(level) However, I'm using urllib not http.client directly. How can I set it up so that my http request display debugging information in this way? Here's what I"m using so far. What's the best way to

Download a file to a specific folder with python

泪湿孤枕 提交于 2019-11-30 14:15:22
I am trying to download a particular file to a specific folder on my hardisk. I am using IronPython 2.7 and urllib module. I tried downloading the file with the following code: import urllib response = urllib.urlretrieve(someURL, 'C:/someFolder') html = response.read() response.close() But when upper code is ran, I get the following error message: Runtime error (IOException): Access to the path 'D:\someFolder' is denied. Traceback: line 91, in urlretrieve, "C:\Program Files\IronPython\Lib\urllib.py" line 9, in script line 241, in retrieve, "C:\Program Files\IronPython\Lib\urllib.py" I tried