urllib2

python2 、python3 urllib 模块一些特别的对应关系

笑着哭i 提交于 2019-12-05 14:55:52
python2 "from urllib import quote" 对应 python3 "from urllib import parse" urillb2在python3中的变化。 在Pytho2.x中使用import urllib2——-对应的,在Python3.x中会使用import urllib.request,urllib.error。 在Pytho2.x中使用import urllib——-对应的,在Python3.x中会使用import urllib.request,urllib.error,urllib.parse 在Pytho2.x中使用import urlparse——-对应的,在Python3.x中会使用import urllib.parse。 在Pytho2.x中使用import urlopen——-对应的,在Python3.x中会使用import urllib.request.urlopen。 在Pytho2.x中使用import urlencode——-对应的,在Python3.x中会使用import urllib.parse.urlencode。 在Pytho2.x中使用import urllib.quote——-对应的,在Python3.x中会使用import urllib.request.quote。 在Pytho2.x中使用cookielib

How to implement a timeout control for urlllib2.urlopen

六眼飞鱼酱① 提交于 2019-12-05 11:18:44
How to implement a controlling for urlllib2.urlopen in Python ? I just wanna monitor that if in 5 seconds no xml data return, cut this connection and connect again? Should I use some timer? thx urllib2.urlopen("http://www.example.com", timeout=5) From the urllib2 documentation ... The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS and FTP connections. 来源: https://stackoverflow.com/questions/16018007/how-to-implement-a

How to make a Python HTTP Request with POST data and Cookie?

回眸只為那壹抹淺笑 提交于 2019-12-05 11:05:09
问题 I am trying to do a HTTP POST using cookies in Python. I have the values of URL, POST data and cookie. import urllib2 url="http://localhost/testing/posting.php" data="subject=Alice-subject&addbbcode18=%23444444&addbbcode20=0&helpbox=Close+all+open+bbCode+tags&message=alice-body&poll_title=&add_poll_option_text=&poll_length=&mode=newtopic&sid=5b2e663a3d724cc873053e7ca0f59bd0&f=1&post=Submit" cookie = "phpbb2mysql_data=a%3A2%3A%7Bs%3A11%3A%22autologinid%22%3Bs%3A0%3A%22%22%3Bs%3A6%3A%22userid

Alternatives to Selenium/Webdriver for filling in fields when scraping headlessly with Python?

旧城冷巷雨未停 提交于 2019-12-05 09:36:55
问题 With Python 2.7 I'm scraping with urllib2 and when some Xpath is needed, lxml as well. It's fast , and because I rarely have to navigate around the sites, this combination works well. On occasion though, usually when I reach a page that will only display some valuable data when a short form is filled in and a submit button is clicked (example), the scraping-only approach with urllib2 is not sufficient. Each time such a page were encountered, I could invoke selenium.webdriver to refetch the

Python爬虫:设置Cookie解决网站拦截并爬取蚂蚁短租

杀马特。学长 韩版系。学妹 提交于 2019-12-05 09:14:00
前言 文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理。 作者: Eastmount PS:如有需要Python学习资料的小伙伴可以加点击下方链接自行获取 http://note.youdao.com/noteshare?id=3054cce4add8a909e784ad934f956cef 我们在编写Python爬虫时,有时会遇到网站拒绝访问等反爬手段,比如这么我们想爬取蚂蚁短租数据,它则会提示“当前访问疑似黑客攻击,已被网站管理员设置为拦截”提示,如下图所示。此时我们需要采用设置Cookie来进行爬取,下面我们进行详细介绍。非常感谢我的学生承峰提供的思想,后浪推前浪啊! 一. 网站分析与爬虫拦截 当我们打开蚂蚁短租搜索贵阳市,反馈如下图所示结果。 我们可以看到短租房信息呈现一定规律分布,如下图所示,这也是我们要爬取的信息。 通过浏览器审查元素,我们可以看到需要爬取每条租房信息都位于 <dd> </dd> 节点下。 在定位房屋名称,如下图所示,位于 <div class="room-detail clearfloat"> </div> 节点下。 接下来我们写个简单的BeautifulSoup进行爬取。 1 # -*- coding: utf-8 -*- 2 import urllib 3 import re 4

Using urllib2 in Python. How do I get the name of the file I am downloading?

ε祈祈猫儿з 提交于 2019-12-05 08:28:37
I am a python beginner. I am using urllib2 to download files. When I download a file, I specify a filename to with which to save the downloaded file on my hard drive. However, if I download the file using my browser, a default filename is automatically provided. Here is a simplified version of my code: def downloadmp3(url): webFile = urllib2.urlopen(url) filename = 'temp.zip' localFile = open(filename, 'w') localFile.write(webFile.read()) The file downloads just fine, but if I type the string stored in the variable "url" into my browser, there is a default filename given to the file when I

Pass a JSON object to an url with requests

偶尔善良 提交于 2019-12-05 08:14:11
So, I want to use Kenneth' excellent requests module . Stumbled up this problem while trying to use the Freebase API . Basically, their API looks like that: https://www.googleapis.com/freebase/v1/mqlread?query=... as a query, they expect a JSON object, here's one that will return a list of wines with their country and percentage of alcohol : [{ "country": null, "name": null, "percentage_alcohol": null, "percentage_alcohol>": 0, "type": "/food/wine" }]​ Of course, we'll have to escape the hell out of this before passing it to an URL, so the actual query will look like this: fullurl = 'https:/

Sending form data to aspx page

你离开我真会死。 提交于 2019-12-05 07:46:56
There is a need to do a search on the website url = r'http://www.cpso.on.ca/docsearch/' this is an aspx page (I'm beginning this trek as of yesterday, sorry for noob questions) using BeautifulSoup, I can get the __VIEWSTATE and __EVENTVALIDATION like this: viewstate = soup.find('input', {'id' : '__VIEWSTATE'})['value'] eventval = soup.find('input', {'id' : '__EVENTVALIDATION'})['value'] and the header can be set like this: headers = {'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13', 'HTTP_ACCEPT': 'text/html,application/xhtml+xml

Is there a library for urllib2 for python which we can download?

夙愿已清 提交于 2019-12-05 07:15:56
I need to use urllib2 with BeautifulSoup. I found the download file for BeautifulSoup and installed it, however, I couldn't find any download files for urllib2, is there another way to intall that module? The module comes with Python, simply import it: import urllib2 If you're using Python3, the urllib was replaced by urllib.request . The Urllib PEP (Python3): http://www.python.org/dev/peps/pep-3108/#urllib-package . 来源: https://stackoverflow.com/questions/16597865/is-there-a-library-for-urllib2-for-python-which-we-can-download

Mechanze form submission causes 'Assertion Error' in response when .read() is attempted

一笑奈何 提交于 2019-12-05 07:01:09
问题 I am writing a web-crawl program with python and am unable to login using mechanize. The form on the site looks like: <form method="post" action="PATLogon"> <h2 align="center"><img src="/myaladin/images/aladin_logo_rd.gif"></h2> <!-- ALADIN Request parameters --> <input type=hidden name=req value="db"> <input type=hidden name=key value="PROXYAUTH"> <input type=hidden name=url value="http://eebo.chadwyck.com/search"> <input type=hidden name=lib value="8"> <table> <tr><td><b>Last Name:</b></td>