urllib2

Does urllib2.urlopen() cache stuff?

删除回忆录丶 提交于 2019-11-29 06:00:42
They didn't mention this in python documentation. And recently I'm testing a website simply refreshing the site using urllib2.urlopen() to extract certain content, I notice sometimes when I update the site urllib2.urlopen() seems not get the newly added content. So I wonder it does cache stuff somewhere, right? So I wonder it does cache stuff somewhere, right? It doesn't. If you don't see new data, this could have many reasons. Most bigger web services use server-side caching for performance reasons, for example using caching proxies like Varnish and Squid or application-level caching. If the

Python urllib2 HTTPBasicAuthHandler

旧巷老猫 提交于 2019-11-29 05:19:16
Here is the code: import urllib2 as URL def get_unread_msgs(user, passwd): auth = URL.HTTPBasicAuthHandler() auth.add_password( realm='New mail feed', uri='https://mail.google.com', user='%s'%user, passwd=passwd ) opener = URL.build_opener(auth) URL.install_opener(opener) try: feed= URL.urlopen('https://mail.google.com/mail/feed/atom') return feed.read() except: return None It works just fine. The only problem is that when a wrong username or password is used, it takes forever to open to url @ feed= URL.urlopen('https://mail.google.com/mail/feed/atom') It doesn't throw up any errors, just keep

Python script to translate via google translate

岁酱吖の 提交于 2019-11-29 05:13:29
I'm trying to learn python, so I decided to write a script that could translate something using google translate. Till now I wrote this: import sys from BeautifulSoup import BeautifulSoup import urllib2 import urllib data = {'sl':'en','tl':'it','text':'word'} request = urllib2.Request('http://www.translate.google.com', urllib.urlencode(data)) request.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11') opener = urllib2.build_opener() feeddata = opener.open(request).read() #print feeddata soup = BeautifulSoup(feeddata) print soup

Download file using urllib in Python with the wget -c feature

谁说胖子不能爱 提交于 2019-11-29 04:25:29
I am programming a software in Python to download HTTP PDF from a database. Sometimes the download stop with this message : retrieval incomplete: got only 3617232 out of 10689634 bytes How can I ask the download to restart where it stops using the 206 Partial Content HTTP feature ? I can do it using wget -c and it works pretty well, but I would like to implement it directly in my Python software. Any idea ? Thank you You can request a partial download by sending a GET with the Range header: import urllib2 req = urllib2.Request('http://www.python.org/') # # Here we request that bytes 18000-

How do I draw out specific data from an opened url in Python using urllib2?

陌路散爱 提交于 2019-11-29 04:17:07
问题 I'm new to Python and am playing around with making a very basic web crawler. For instance, I have made a simple function to load a page that shows the high scores for an online game. So I am able to get the source code of the html page, but I need to draw specific numbers from that page. For instance, the webpage looks like this: http://hiscore.runescape.com/hiscorepersonal.ws?user1=bigdrizzle13 where 'bigdrizzle13' is the unique part of the link. The numbers on that page need to be drawn

urllib2 returns 404 for a website which displays fine in browsers

我的梦境 提交于 2019-11-29 04:13:40
I am not able to open one particular url using urllib2. Same approach works well with other websites such as "http://www.google.com" but not this site (which also displays fine in the browser). my simple code: from BeautifulSoup import BeautifulSoup import urllib2 url="http://www.experts.scival.com/einstein/" response=urllib2.urlopen(url) html=response.read() soup=BeautifulSoup(html) print soup Can anyone help me to make it work? this is error I got: Traceback (most recent call last): File "/Users/jontaotao/Documents/workspace/MedicalSchoolInfo/src/AlbertEinsteinCollegeOfMedicine_SciValExperts

Python urllib2 > HTTP Proxy > HTTPS request

醉酒当歌 提交于 2019-11-29 03:57:46
This work fine: import urllib2 opener = urllib2.build_opener( urllib2.HTTPHandler(), urllib2.HTTPSHandler(), urllib2.ProxyHandler({'http': 'http://user:pass@proxy:3128'})) urllib2.install_opener(opener) print urllib2.urlopen('http://www.google.com').read() But, if http change to https : ... print urllib2.urlopen('https://www.google.com').read() There are errors: Traceback (most recent call last): File "D:\Temp\6\tmp.py", line 13, in <module> print urllib2.urlopen('https://www.google.com').read() File "C:\Python26\lib\urllib2.py", line 124, in urlopen return _opener.open(url, data, timeout)

How can I force urllib2 to time out?

旧城冷巷雨未停 提交于 2019-11-29 03:50:55
问题 I want to to test my application's handling of timeouts when grabbing data via urllib2, and I want to have some way to force the request to timeout. Short of finding a very very slow internet connection, what method can I use? I seem to remember an interesting application/suite for simulating these sorts of things. Maybe someone knows the link? 回答1: I usually use netcat to listen on port 80 of my local machine: nc -l 80 Then I use http://localhost/ as the request URL in my application. Netcat

timeout for urllib2.urlopen() in pre Python 2.6 versions

眉间皱痕 提交于 2019-11-29 01:45:50
问题 The urllib2 documentation says that timeout parameter was added in Python 2.6. Unfortunately my code base has been running on Python 2.5 and 2.4 platforms. Is there any alternate way to simulate the timeout? All I want to do is allow the code to talk the remote server for a fixed amount of time. Perhaps any alternative built-in library? (Don't want install 3rd party, like pycurl) 回答1: you can set a global timeout for all socket operations (including HTTP requests) by using: socket

Windows Authentication with Python and urllib2

心已入冬 提交于 2019-11-29 01:23:52
问题 I want to grab some data off a webpage that requires my windows username and password. So far, I've got: opener = build_opener() try: page = opener.open("http://somepagewhichneedsmywindowsusernameandpassword/") print page except URLError: print "Oh noes." Is this supported by urllib2? I've found Python NTLM, but that requires me to put my username and password in. Is there any way to just grab the authentication information somehow (e.g. like IE does, or Firefox, if I changed the network