urllib2 | 易学教程

Python urllib2 file upload problems

阅读更多关于 Python urllib2 file upload problems

问题 I'm currently trying to initiate a file upload with urllib2 and the urllib2_file library. Here's my code: import sys import urllib2_file import urllib2 URL='http://aquate.us/upload.php' d = [('uploaded', open(sys.argv[1:]))] req = urllib2.Request(URL, d) u = urllib2.urlopen(req) print u.read() I've placed this .py file in my My Documents directory and placed a shortcut to it in my Send To folder (the shortcut URL is ). When I right click a file, choose Send To, and select Aquate (my python),

Python urllib2 file upload problems

阅读更多关于 Python urllib2 file upload problems

Python urllib2 automatic form filling and retrieval of results

阅读更多关于 Python urllib2 automatic form filling and retrieval of results

问题 I'm looking to be able to query a site for warranty information on a machine that this script would be running on. It should be able to fill out a form if needed ( like in the case of say HP's service site) and would then be able to retrieve the resulting web page. I already have the bits in place to parse the resulting html that is reported back I'm just having trouble with what needs to be done in order to do a POST of data that needs to be put in the fields and then being able to retrieve

How to read image from in memory buffer (StringIO) or from url with opencv python library

阅读更多关于 How to read image from in memory buffer (StringIO) or from url with opencv python library

问题 Just share a way to create opencv image object from in memory buffer or from url to improve performance. Sometimes we get image binary from url, to avoid additional file IO, we want to imread this image from in memory buffer or from url, but imread only supports read image from file system with path. 回答1: To create an OpenCV image object with in memory buffer(StringIO), we can use OpenCV API imdecode, see code below: import cv2 import numpy as np from urllib2 import urlopen from cStringIO

wget Vs urlretrieve of python

阅读更多关于 wget Vs urlretrieve of python

问题 I have a task to download Gbs of data from a website. The data is in form of .gz files, each file being 45mb in size. The easy way to get the files is use "wget -r -np -A files url". This will donwload data in a recursive format and mirrors the website. The donwload rate is very high 4mb/sec. But, just to play around I was also using python to build my urlparser. Downloading via Python's urlretrieve is damm slow, possible 4 times as slow as wget. The download rate is 500kb/sec. I use

python,not getting full response

阅读更多关于 python,not getting full response

问题 when I want to get the page using urllib2, I don't get the full page. here is the code in python: import urllib2 import urllib import socket from bs4 import BeautifulSoup # define the frequency for http requests socket.setdefaulttimeout(5) # getting the page def get_page(url): """ loads a webpage into a string """ src = '' req = urllib2.Request(url) try: response = urllib2.urlopen(req) src = response.read() response.close() except IOError: print 'can\'t open',url return src return src def

Getting the final destination of a javascript redirect on a website

阅读更多关于 Getting the final destination of a javascript redirect on a website

问题 I parse a website with python. They use a lot of redirects and they do them by calling javascript functions. So when I just use urllib to parse the site, it doesn't help me, because I can't find the destination url in the returned html code. Is there a way to access the DOM and call the correct javascript function from my python code? All I need is the url, where the redirect takes me. 回答1: I looked into Selenium. And if you are not running a pure script (meaning you don't have a display and

Python script to translate via google translate

阅读更多关于 Python script to translate via google translate

问题 I'm trying to learn python, so I decided to write a script that could translate something using google translate. Till now I wrote this: import sys from BeautifulSoup import BeautifulSoup import urllib2 import urllib data = {'sl':'en','tl':'it','text':'word'} request = urllib2.Request('http://www.translate.google.com', urllib.urlencode(data)) request.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11') opener = urllib2.build

Download file using urllib in Python with the wget -c feature

阅读更多关于 Download file using urllib in Python with the wget -c feature

问题 I am programming a software in Python to download HTTP PDF from a database. Sometimes the download stop with this message : retrieval incomplete: got only 3617232 out of 10689634 bytes How can I ask the download to restart where it stops using the 206 Partial Content HTTP feature ? I can do it using wget -c and it works pretty well, but I would like to implement it directly in my Python software. Any idea ? Thank you 回答1: You can request a partial download by sending a GET with the Range

How do I scrape pages with dynamically generated URLs using Python?

阅读更多关于 How do I scrape pages with dynamically generated URLs using Python?

问题 I am trying to scrape http://www.dailyfinance.com/quote/NYSE/international-business-machines/IBM/financial-ratios, but the traditional url string building technique doesn't work because the "full-company-name-is-inserted-in-the-path" string. And the exact "full-company-name" isn't known in advance. Only the company symbol, "IBM" is known. Essentially, the way I scrape is by looping through an array of company symbol and build the url string before sending it to urllib2.urlopen(url). But in