urllib2

urllib2 urlopen works very randomly

杀马特。学长 韩版系。学妹 提交于 2019-12-07 16:29:07
问题 For some reasons this part where I fetch JSON data from following url will only works sometimes. And sometimes it will return 404 error, and complain about missing header attribute. It will work 100% of the time if I paste it onto a web browser. So I'm sure the link is not broken or something. I get the following error in Python: AttributeError: 'HTTPError' object has no attribute 'header' What's the reason for this and can it be fixed? Btw I removed API key since it is private. try: url =

Basic fetching of a URL's HTML body with Python 3.x

不羁岁月 提交于 2019-12-07 15:34:25
问题 I'm a Python newbie. I have been a little confused by the differences between the old urllib and urllib2 in Python 2.x and the new urllib in Python 3, and among other things I'm not sure when data needs to be encoded before being sent to urlopen. I have been trying to fetch the html body of a url, using a POST so that I can send parameters. The webpage displays sunshine data for a country over a particular hour of a given day. I have tried without encoding/decoding and the printout is a

Python urllib2 giving “network unreachable error” if the URL is https

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-07 14:35:29
问题 I am trying to fetch some urls using urllib2 library. a = urllib2.urlopen("http://www.google.com") ret = a.read() Code above is working fine, and giving expected result. But when I make the url https, it gives "network unreachable" error a = urllib2.urlopen("https://www.google.com") urllib2.URLError: <urlopen error [Errno 101] Network is unreachable> Is there any problem with ssl? My python version is Python2.6.5. I am also behind an academic proxy server. I have the settings in bash file.

Unknown url type error in urllib2

雨燕双飞 提交于 2019-12-07 12:11:35
问题 I have searched a lot of similar question on SO, but did not find an exact match to my case. I am trying to download a video using python 2.7 Here is my code for downloading the video import urllib2 from bs4 import BeautifulSoup as bs with open('video.txt','r') as f: last_downloaded_video = f.read() webpage = urllib2.urlopen('http://*.net/watch/**-'+last_downloaded_video) soup = bs(webpage) a = [] for link in soup.find_all('a'): if link.has_attr('data-video-id'): a.append(link) #try just with

unbuffered urllib2.urlopen

匆匆过客 提交于 2019-12-07 11:17:29
I have client for web interface to long running process. I'd like to have output from that process to be displayed as it comes. Works great with urllib.urlopen() , but it doesn't have timeout parameter. On the other hand with urllib2.urlopen() the output is buffered. Is there a easy way to disable that buffer? A quick hack that has occurred to me is to use urllib.urlopen() with threading.Timer() to emulate timeout. But that's only quick and dirty hack. urllib2 is buffered when you just call read() you could define a size to read and therefore disable buffering. for example: import urllib2

Fetch first n bytes from the URL

孤者浪人 提交于 2019-12-07 08:21:48
问题 Is that possible to fetch only a number of bytes from some URL and then close the connection with urllib/urllib2? Or even may be a part from n-th byte to k-th? There is a page on that side and I don't need to load the whole page, only a piece of it. 回答1: You can set the Range header to request a certain range of bytes, but you are dependent on the server to honor the request: import urllib2 req = urllib2.Request('http://www.python.org/') # # Here we request that bytes 18000--19000 be

“Out of Memory” error with mechanize

最后都变了- 提交于 2019-12-07 06:43:33
问题 I was trying to scrape some information from a website page by page, basically here's what I did: import mechanize MechBrowser = mechanize.Browser() Counter = 0 while Counter < 5000: Response = MechBrowser.open("http://example.com/page" + str(Counter)) Html = Response.read() Response.close() OutputFile = open("Output.txt", "a") OutputFile.write(Html) OutputFile.close() Counter = Counter + 1 Well, the above codes ended up throwing out "Out of Memory" error and in task manager it shows that the

Using urllib2 in Python. How do I get the name of the file I am downloading?

浪尽此生 提交于 2019-12-07 05:51:42
问题 I am a python beginner. I am using urllib2 to download files. When I download a file, I specify a filename to with which to save the downloaded file on my hard drive. However, if I download the file using my browser, a default filename is automatically provided. Here is a simplified version of my code: def downloadmp3(url): webFile = urllib2.urlopen(url) filename = 'temp.zip' localFile = open(filename, 'w') localFile.write(webFile.read()) The file downloads just fine, but if I type the string

Download A Single File Using Multiple Threads

喜夏-厌秋 提交于 2019-12-07 03:50:49
问题 I'm trying to create a 'Download Manager' for Linux that lets me download one single file using multiple threads. This is what I'm trying to do : Divide the file to be downloaded into different parts by specifying an offset Download the different parts into a temporary location Merge them into a single file. Steps 2 and 3 are solvable, and it is at Step #1 that I'm stuck. How do I specify an offset while downloading a file? Using something along the lines of open("/path/to/file", "wb").write

Python, Detect is a URL needs to be HTTPS vs HTTP

偶尔善良 提交于 2019-12-07 02:07:41
问题 Using the python standard library, is there a way to determine if a given web address should use HTTP or HTTPS? If you hit a site using HTTP://.com is there a standard error code that says hey dummy it should be 'HTTPS' not http? Thank you 回答1: Did u make any sort of testing? The short, prematural answer of your questions is: Does not exist should use... it's your preference, or a server decision at all, because of redirects. Some servers does allow only https, and when you call http does