urllib2

Parallel fetching of files

你离开我真会死。 提交于 2019-11-27 18:03:26
In order to download files, I'm creating a urlopen object (urllib2 class) and reading it in chunks. I would like to connect to the server several times and download the file in six different sessions. Doing that, the download speed should get faster. Many download managers have this feature. I thought about specifying the part of file i would like to download in each session, and somehow process all the sessions in the same time. I'm not sure how I can achieve this. Sounds like you want to use one of the flavors of HTTP Range that are available. edit Updated link to point to the w3.org stored

Python urllib2 > HTTP Proxy > HTTPS request

丶灬走出姿态 提交于 2019-11-27 17:59:58
问题 This work fine: import urllib2 opener = urllib2.build_opener( urllib2.HTTPHandler(), urllib2.HTTPSHandler(), urllib2.ProxyHandler({'http': 'http://user:pass@proxy:3128'})) urllib2.install_opener(opener) print urllib2.urlopen('http://www.google.com').read() But, if http change to https : ... print urllib2.urlopen('https://www.google.com').read() There are errors: Traceback (most recent call last): File "D:\Temp\6\tmp.py", line 13, in <module> print urllib2.urlopen('https://www.google.com')

Python POST binary data

被刻印的时光 ゝ 提交于 2019-11-27 17:59:08
I am writing some code to interface with redmine and I need to upload some files as part of the process, but I am not sure how to do a POST request from python containing a binary file. I am trying to mimic the commands here : curl --data-binary "@image.png" -H "Content-Type: application/octet-stream" -X POST -u login:password http://redmine/uploads.xml In python (below), but it does not seem to work. I am not sure if the problem is somehow related to encoding the file or if something is wrong with the headers. import urllib2, os FilePath = "C:\somefolder\somefile.7z" FileData = open(FilePath,

catch specific HTTP error in python

不打扰是莪最后的温柔 提交于 2019-11-27 17:40:57
I want to catch a specific http error and not any one of the entire family.. what I was trying to do is -- import urllib2 try: urllib2.urlopen("some url") except urllib2.HTTPError: <whatever> but what I end up is catching any kind of http error, but I want to catch only if the specified webpage doesn't exist!! probably that's HTTP error 404..but I don't know how to specify that catch only error 404 and let the system run the default handler for other events..ny suggestions?? Tim Pietzcker Just catch urllib2.HTTPError , handle it, and if it's not Error 404, simply use raise to re-raise the

Python: urllib/urllib2/httplib confusion

陌路散爱 提交于 2019-11-27 16:55:56
I'm trying to test the functionality of a web app by scripting a login sequence in Python, but I'm having some troubles. Here's what I need to do: Do a POST with a few parameters and headers. Follow a redirect Retrieve the HTML body. Now, I'm relatively new to python, but the two things I've tested so far haven't worked. First I used httplib, with putrequest() (passing the parameters within the URL), and putheader(). This didn't seem to follow the redirects. Then I tried urllib and urllib2, passing both headers and parameters as dicts. This seems to return the login page, instead of the page I

using tor as a SOCKS5 proxy with python urllib2 or mechanize

*爱你&永不变心* 提交于 2019-11-27 16:53:05
问题 My goal is to use python's mechanize with a tor SOCKS proxy. I am not using a GUI with the following Ubuntu version: Description: Ubuntu 12.04.1 LTS Release: 12.04 Codename: precise Tor is installed and is listening on port 9050 according to the nmap scan: Starting Nmap 5.21 ( http://nmap.org ) at 2013-01-22 00:50 UTC Nmap scan report for localhost (127.0.0.1) Host is up (0.000011s latency). Not shown: 996 closed ports PORT STATE SERVICE 22/tcp open ssh 80/tcp open http 3306/tcp open mysql

how to deal with ® in url for urllib2.urlopen?

╄→尐↘猪︶ㄣ 提交于 2019-11-27 15:54:27
I received a url: https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp ®-75-desktop-virtualization-solutions; it is from BeautifulSoup. url=u'https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp\xae-75-desktop-virtualization-solutions' I want to feed back into urllib2.urlopen again. import urllib2 source = urllib2.urlopen(url).read() The error I get: UnicodeEncodeError: 'gbk' codec can't encode character u'\xae' in position 43: illegal multibyte sequence Thus, I tried: source = urllib2.urlopen(url.encode("utf-8")).read() It got page source, however it is different from

python3报错:import urllib2 ModuleNotFoundError

吃可爱长大的小学妹 提交于 2019-11-27 14:57:53
欢迎来博主个人博客做客👉 https://www.xbbdbb.top/ import urllib2 ModuleNotFoundError: No module named ‘urllib2’——导入urllib2 找不到模块错误:没有名为“urllib2”的模块 今天开始学习Python的urllib2库时,试了一下一段基础的代码,如下: import urllib2 request = urllib2 . Request ( "http://www.baidu.com/" ) response = urllib2 . urlopen ( request ) print response . read ( ) 报了如下错误: import urllib2 ModuleNotFoundError : No module named 'urllib2' 因安装的是python3.7,在python3以后应该用urllib.request代替urllib2,所以修改如下: import urllib . request request = urllib . request . Request ( "http://www.baidu.com/" ) response = urllib . request . urlopen ( request ) print ( response .

Python urllib2.HTTPError: HTTP Error 503: Service Unavailable on valid website

北战南征 提交于 2019-11-27 14:55:48
问题 I have been using Amazon's Product Advertising API to generate urls that contains prices for a given book. One url that I have generated is the following: http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327 When I click on the link or paste the link on the address bar, the web page loads fine. However, when I execute the following code I get an error: url

Web scraping - how to access content rendered in JavaScript via Angular.js?

混江龙づ霸主 提交于 2019-11-27 14:41:31
I'm trying to scrape data from the public site asx.com.au The page http://www.asx.com.au/asx/research/company.do#!/ACB/details contains a div with class 'view-content', which has the information I need: But when I try to view this page via Python's urllib2.urlopen that div is empty: import urllib2 from bs4 import BeautifulSoup url = 'http://www.asx.com.au/asx/research/company.do#!/ACB/details' page = urllib2.urlopen(url).read() soup = BeautifulSoup(page, "html.parser") contentDiv = soup.find("div", {"class": "view-content"}) print(contentDiv) # the results is an empty div: # <div class="view