urllib | 易学教程

python爬虫常用模块

阅读更多关于 python爬虫常用模块

对于一些简单的爬虫，python（基于python3）有更好的第三方库来实现它，且容易上手。 Python标准库–logging模块 logging模块能够代替print函数的功能，将标准输出到日志文件保存起来，利用loggin模块可以部分替代debug re模块正则表达式 sys模块系统相关模块 sys.argv(返回一个列表，包含所有的命令行) sys.exit(退出程序) Python标准库–urllib模块 urllib.requset.urlioen可以打开HTTP（主要）、HTTPS、FTP、协议的URL ca 身份验证 data 以post方式提交URL时使用 url 提交网络地址（全程前端需协议名后端需端口 http:/192.168.1.1:80） timeout 超时时间设置函数返回对象有三个额外的方法 geturl() 返回response的url信息常用与url重定向 info()返回response的基本信息 getcode()返回response的状态代码 1，request urllib.request最常见的用法是直接使用urllib.request.urlopen()来发起请求，但通常这样是不规范的一个完整的请求还应该包括headers这样的信息传递，可以这样实通常防止爬虫被检测，我们需要规定headers，伪造爬虫头部信息

hangs on open url with urllib (python3)

阅读更多关于 hangs on open url with urllib (python3)

I try to open url with python3: import urllib.request fp = urllib.request.urlopen("http://lebed.com/") mybytes = fp.read() mystr = mybytes.decode("utf8") fp.close() print(mystr) But it hangs on second line. What's the reason of this problem and how to fix it? I suppose the reason is that the url does not support robot visiting a site visit. You need to fake a browser visit by sending browser headers along with your request import urllib.request url = "http://lebed.com/" req = urllib.request.Request( url, data=None, headers={ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3)

【爬虫】python爬虫

阅读更多关于【爬虫】python爬虫

爬虫章节 1.python 如何访问互联网 URL （网页地址） +lib= 》 urllib 2. 有问题查文档： python document. 3.response = urllib.request.urlopen(""www.baidu.com) html =html.decode("utf-8") 破除二进制的解码操作 4. 读取网页图片 wb: 二进制 urlopen=request + urlopen 浏览器 -- 审查元素 -- 查看 Network （python 提交 POST 表单）（浏览器和客户端的通信内容） GET ：从服务器请求获得数据 POST ：向指定服务器提交被处理的数据 8 分 45 秒点击 POST （ translate ？ smartresult ）点开之后点击 Preview 就可以看到被翻译的内容。（1）然后分析一下 headers 的内容 ① Status Code ： 200 正常响应的意思 404 表示不正常响应 ② Request Headers 服务器。一般通过下面的 User-Agent 识别是浏览器访问还是代码访问 ③ Form Data ： POST 提交的主要内容（2）POST 需要指定的 data 格式，可以通过 parse 转换格式（3）代码实现翻译 POST 功能 import urllib

Python 3- How to retrieve an image from the web and display in a GUI using TKINTER?

阅读更多关于 Python 3- How to retrieve an image from the web and display in a GUI using TKINTER?

I want a function that, when a button is clicked, it will take an image from the web using URLLIB and display it in a GUI using TKINTER. I'm new to both URLLIB and TKINTER, so I'm having an incredibly difficult time doing this. Tried this, but it obviously doesn't work because it uses a textbox and only will display text. def __init__(self, root): self.root = root self.root.title('Image Retrieval Program') self.init_widgets() def init_widgets(self): self.btn = ttk.Button(self.root, command=self.get_url, text='Get Url', width=8) self.btn.grid(column=0, row=0, sticky='w') self.entry = ttk.Entry

Python unable to retrieve form with urllib or mechanize

阅读更多关于 Python unable to retrieve form with urllib or mechanize

I'm trying to fill out and submit a form using Python, but I'm not able to retrieve the resulting page. I've tried both mechanize and urllib/urllib2 methods to post the form, but both run into problems. The form I'm trying to retrieve is here: http://zrs.leidenuniv.nl/ul/start.php . The page is in Dutch, but this is irrelevant to my problem. It may be noteworthy that the form action redirects to http://zrs.leidenuniv.nl/ul/query.php . First of all, this is the urllib/urllib2 method I've tried: import urllib, urllib2 import socket, cookielib url = 'http://zrs.leidenuniv.nl/ul/start.php' params

Opening Local File Works with urllib but not with urllib2

阅读更多关于 Opening Local File Works with urllib but not with urllib2

问题 I'm trying to open a local file using urllib2. How can I go about doing this? When I try the following line with urllib: resp = urllib.urlopen(url) it works correctly, but when I switch it to: resp = urllib2.urlopen(url) I get: ValueError: unknown url type: /path/to/file where that file definitely does exit. Thanks! 回答1: Just put "file://" in front of the path >>> import urllib2 >>> urllib2.urlopen("file:///etc/debian_version").read() 'wheezy/sid\n' 回答2: In urllib.urlopen method: If the URL

how to check if the urllib2 follow a redirect?

阅读更多关于 how to check if the urllib2 follow a redirect?

I've write this function: def download_mp3(url,name): opener1 = urllib2.build_opener() page1 = opener1.open(url) mp3 = page1.read() filename = name+'.mp3' fout = open(filename, 'wb') fout.write(mp3) fout.close() This function take an url and a name both as string. Then will download and save an mp3 from the url with the name of the variable name. the url is in the form http://site/download.php?id=xxxx where xxxx is the id of an mp3 if this id does not exist the site redirects me to another page. So, the question is: how Can I check if this id exist? I've tried to check if the url exist with a

Python and urllib

阅读更多关于 Python and urllib

I'm trying to download a zip file ("tl_2008_01001_edges.zip") from an ftp census site using urllib. What form is the zip file in when I get it and how do I save it? I'm fairly new to Python and don't understand how urllib works. This is my attempt: import urllib, sys zip_file = urllib.urlretrieve("ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/Autauga_County/", "tl_2008_01001_edges.zip") If I know the list of ftp folders (or counties in this case), can I run through the ftp site list using the glob function? Thanks. gimel Use urllib2.urlopen() for the zip file data and directory listing.

Response time for urllib in python

阅读更多关于 Response time for urllib in python

问题 I want to get response time when I use urllib . I made below code, but it is more than response time. Can I get the time using urllib or have any other method? import urllib import datetime def main(): urllist = [ "http://google.com", ] for url in urllist: opener = urllib.FancyURLopener({}) try: start = datetime.datetime.now() f = opener.open(url) end = datetime.datetime.now() diff = end - start print int(round(diff.microseconds / 1000)) except IOError, e: print 'error', url else: print f

SSL: CERTIFICATE_VERIFY_FAILED with urllib

阅读更多关于 SSL: CERTIFICATE_VERIFY_FAILED with urllib

问题 I'm running into trouble with the module urllib (Python 3.6). Every time I use the module, I get a page's worth of errors. what's wrong with urllib and how to fix it? import urllib.request url='https://www.goodreads.com/quotes/tag/artificial-intelligence' u1 = urllib.request.urlopen(url) print(u1) That block of code likes to spit out this mouthful of stuff: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1318,