urllib2 | 易学教程

urllib2 urlopen works very randomly

阅读更多关于 urllib2 urlopen works very randomly

问题 For some reasons this part where I fetch JSON data from following url will only works sometimes. And sometimes it will return 404 error, and complain about missing header attribute. It will work 100% of the time if I paste it onto a web browser. So I'm sure the link is not broken or something. I get the following error in Python: AttributeError: 'HTTPError' object has no attribute 'header' What's the reason for this and can it be fixed? Btw I removed API key since it is private. try: url =

Basic fetching of a URL's HTML body with Python 3.x

阅读更多关于 Basic fetching of a URL's HTML body with Python 3.x

问题 I'm a Python newbie. I have been a little confused by the differences between the old urllib and urllib2 in Python 2.x and the new urllib in Python 3, and among other things I'm not sure when data needs to be encoded before being sent to urlopen. I have been trying to fetch the html body of a url, using a POST so that I can send parameters. The webpage displays sunshine data for a country over a particular hour of a given day. I have tried without encoding/decoding and the printout is a

Python urllib2 giving “network unreachable error” if the URL is https

阅读更多关于 Python urllib2 giving “network unreachable error” if the URL is https

问题 I am trying to fetch some urls using urllib2 library. a = urllib2.urlopen("http://www.google.com") ret = a.read() Code above is working fine, and giving expected result. But when I make the url https, it gives "network unreachable" error a = urllib2.urlopen("https://www.google.com") urllib2.URLError: <urlopen error [Errno 101] Network is unreachable> Is there any problem with ssl? My python version is Python2.6.5. I am also behind an academic proxy server. I have the settings in bash file.

Unknown url type error in urllib2

阅读更多关于 Unknown url type error in urllib2

问题 I have searched a lot of similar question on SO, but did not find an exact match to my case. I am trying to download a video using python 2.7 Here is my code for downloading the video import urllib2 from bs4 import BeautifulSoup as bs with open('video.txt','r') as f: last_downloaded_video = f.read() webpage = urllib2.urlopen('http://*.net/watch/**-'+last_downloaded_video) soup = bs(webpage) a = [] for link in soup.find_all('a'): if link.has_attr('data-video-id'): a.append(link) #try just with

unbuffered urllib2.urlopen

阅读更多关于 unbuffered urllib2.urlopen

I have client for web interface to long running process. I'd like to have output from that process to be displayed as it comes. Works great with urllib.urlopen() , but it doesn't have timeout parameter. On the other hand with urllib2.urlopen() the output is buffered. Is there a easy way to disable that buffer? A quick hack that has occurred to me is to use urllib.urlopen() with threading.Timer() to emulate timeout. But that's only quick and dirty hack. urllib2 is buffered when you just call read() you could define a size to read and therefore disable buffering. for example: import urllib2

Fetch first n bytes from the URL

阅读更多关于 Fetch first n bytes from the URL

问题 Is that possible to fetch only a number of bytes from some URL and then close the connection with urllib/urllib2? Or even may be a part from n-th byte to k-th? There is a page on that side and I don't need to load the whole page, only a piece of it. 回答1: You can set the Range header to request a certain range of bytes, but you are dependent on the server to honor the request: import urllib2 req = urllib2.Request('http://www.python.org/') # # Here we request that bytes 18000--19000 be

“Out of Memory” error with mechanize

阅读更多关于 “Out of Memory” error with mechanize

问题 I was trying to scrape some information from a website page by page, basically here's what I did: import mechanize MechBrowser = mechanize.Browser() Counter = 0 while Counter < 5000: Response = MechBrowser.open("http://example.com/page" + str(Counter)) Html = Response.read() Response.close() OutputFile = open("Output.txt", "a") OutputFile.write(Html) OutputFile.close() Counter = Counter + 1 Well, the above codes ended up throwing out "Out of Memory" error and in task manager it shows that the

Using urllib2 in Python. How do I get the name of the file I am downloading?

阅读更多关于 Using urllib2 in Python. How do I get the name of the file I am downloading?

问题 I am a python beginner. I am using urllib2 to download files. When I download a file, I specify a filename to with which to save the downloaded file on my hard drive. However, if I download the file using my browser, a default filename is automatically provided. Here is a simplified version of my code: def downloadmp3(url): webFile = urllib2.urlopen(url) filename = 'temp.zip' localFile = open(filename, 'w') localFile.write(webFile.read()) The file downloads just fine, but if I type the string

Download A Single File Using Multiple Threads

阅读更多关于 Download A Single File Using Multiple Threads

问题 I'm trying to create a 'Download Manager' for Linux that lets me download one single file using multiple threads. This is what I'm trying to do : Divide the file to be downloaded into different parts by specifying an offset Download the different parts into a temporary location Merge them into a single file. Steps 2 and 3 are solvable, and it is at Step #1 that I'm stuck. How do I specify an offset while downloading a file? Using something along the lines of open("/path/to/file", "wb").write

Python, Detect is a URL needs to be HTTPS vs HTTP

阅读更多关于 Python, Detect is a URL needs to be HTTPS vs HTTP

问题 Using the python standard library, is there a way to determine if a given web address should use HTTP or HTTPS? If you hit a site using HTTP://.com is there a standard error code that says hey dummy it should be 'HTTPS' not http? Thank you 回答1: Did u make any sort of testing? The short, prematural answer of your questions is: Does not exist should use... it's your preference, or a server decision at all, because of redirects. Some servers does allow only https, and when you call http does