urllib2 | 易学教程

Does urllib2.urlopen() cache stuff?

阅读更多关于 Does urllib2.urlopen() cache stuff?

They didn't mention this in python documentation. And recently I'm testing a website simply refreshing the site using urllib2.urlopen() to extract certain content, I notice sometimes when I update the site urllib2.urlopen() seems not get the newly added content. So I wonder it does cache stuff somewhere, right? So I wonder it does cache stuff somewhere, right? It doesn't. If you don't see new data, this could have many reasons. Most bigger web services use server-side caching for performance reasons, for example using caching proxies like Varnish and Squid or application-level caching. If the

Python urllib2 HTTPBasicAuthHandler

阅读更多关于 Python urllib2 HTTPBasicAuthHandler

Here is the code: import urllib2 as URL def get_unread_msgs(user, passwd): auth = URL.HTTPBasicAuthHandler() auth.add_password( realm='New mail feed', uri='https://mail.google.com', user='%s'%user, passwd=passwd ) opener = URL.build_opener(auth) URL.install_opener(opener) try: feed= URL.urlopen('https://mail.google.com/mail/feed/atom') return feed.read() except: return None It works just fine. The only problem is that when a wrong username or password is used, it takes forever to open to url @ feed= URL.urlopen('https://mail.google.com/mail/feed/atom') It doesn't throw up any errors, just keep

Python script to translate via google translate

阅读更多关于 Python script to translate via google translate

I'm trying to learn python, so I decided to write a script that could translate something using google translate. Till now I wrote this: import sys from BeautifulSoup import BeautifulSoup import urllib2 import urllib data = {'sl':'en','tl':'it','text':'word'} request = urllib2.Request('http://www.translate.google.com', urllib.urlencode(data)) request.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11') opener = urllib2.build_opener() feeddata = opener.open(request).read() #print feeddata soup = BeautifulSoup(feeddata) print soup

Download file using urllib in Python with the wget -c feature

阅读更多关于 Download file using urllib in Python with the wget -c feature

I am programming a software in Python to download HTTP PDF from a database. Sometimes the download stop with this message : retrieval incomplete: got only 3617232 out of 10689634 bytes How can I ask the download to restart where it stops using the 206 Partial Content HTTP feature ? I can do it using wget -c and it works pretty well, but I would like to implement it directly in my Python software. Any idea ? Thank you You can request a partial download by sending a GET with the Range header: import urllib2 req = urllib2.Request('http://www.python.org/') # # Here we request that bytes 18000-

How do I draw out specific data from an opened url in Python using urllib2?

阅读更多关于 How do I draw out specific data from an opened url in Python using urllib2?

问题 I'm new to Python and am playing around with making a very basic web crawler. For instance, I have made a simple function to load a page that shows the high scores for an online game. So I am able to get the source code of the html page, but I need to draw specific numbers from that page. For instance, the webpage looks like this: http://hiscore.runescape.com/hiscorepersonal.ws?user1=bigdrizzle13 where 'bigdrizzle13' is the unique part of the link. The numbers on that page need to be drawn

urllib2 returns 404 for a website which displays fine in browsers

阅读更多关于 urllib2 returns 404 for a website which displays fine in browsers

I am not able to open one particular url using urllib2. Same approach works well with other websites such as "http://www.google.com" but not this site (which also displays fine in the browser). my simple code: from BeautifulSoup import BeautifulSoup import urllib2 url="http://www.experts.scival.com/einstein/" response=urllib2.urlopen(url) html=response.read() soup=BeautifulSoup(html) print soup Can anyone help me to make it work? this is error I got: Traceback (most recent call last): File "/Users/jontaotao/Documents/workspace/MedicalSchoolInfo/src/AlbertEinsteinCollegeOfMedicine_SciValExperts

Python urllib2 > HTTP Proxy > HTTPS request

阅读更多关于 Python urllib2 > HTTP Proxy > HTTPS request

This work fine: import urllib2 opener = urllib2.build_opener( urllib2.HTTPHandler(), urllib2.HTTPSHandler(), urllib2.ProxyHandler({'http': 'http://user:pass@proxy:3128'})) urllib2.install_opener(opener) print urllib2.urlopen('http://www.google.com').read() But, if http change to https : ... print urllib2.urlopen('https://www.google.com').read() There are errors: Traceback (most recent call last): File "D:\Temp\6\tmp.py", line 13, in <module> print urllib2.urlopen('https://www.google.com').read() File "C:\Python26\lib\urllib2.py", line 124, in urlopen return _opener.open(url, data, timeout)

How can I force urllib2 to time out?

阅读更多关于 How can I force urllib2 to time out?

问题 I want to to test my application's handling of timeouts when grabbing data via urllib2, and I want to have some way to force the request to timeout. Short of finding a very very slow internet connection, what method can I use? I seem to remember an interesting application/suite for simulating these sorts of things. Maybe someone knows the link? 回答1: I usually use netcat to listen on port 80 of my local machine: nc -l 80 Then I use http://localhost/ as the request URL in my application. Netcat

timeout for urllib2.urlopen() in pre Python 2.6 versions

阅读更多关于 timeout for urllib2.urlopen() in pre Python 2.6 versions

问题 The urllib2 documentation says that timeout parameter was added in Python 2.6. Unfortunately my code base has been running on Python 2.5 and 2.4 platforms. Is there any alternate way to simulate the timeout? All I want to do is allow the code to talk the remote server for a fixed amount of time. Perhaps any alternative built-in library? (Don't want install 3rd party, like pycurl) 回答1: you can set a global timeout for all socket operations (including HTTP requests) by using: socket

Windows Authentication with Python and urllib2

阅读更多关于 Windows Authentication with Python and urllib2

问题 I want to grab some data off a webpage that requires my windows username and password. So far, I've got: opener = build_opener() try: page = opener.open("http://somepagewhichneedsmywindowsusernameandpassword/") print page except URLError: print "Oh noes." Is this supported by urllib2? I've found Python NTLM, but that requires me to put my username and password in. Is there any way to just grab the authentication information somehow (e.g. like IE does, or Firefox, if I changed the network