mechanize | 易学教程

Click on a javascript link within python?

阅读更多关于 Click on a javascript link within python?

I am navigating a site using python's mechanize module and having trouble clicking on a javascript link for next page. I did a bit of reading and people suggested I need python-spidermonkey and DOMforms. I managed to get them installed by I am not sure of the syntax to actually click on the link. I can identify the code on the page as: <a href="javascript:__doPostBack('ctl00$MainContent$gvSearchResults','Page$2')">2</a> Does anyone know how to click on it? or if perhaps there's another tool. Thanks I mainly use HtmlUnit under jython for these use cases. Also I published a simple article on the

How can I use Perl to scrape a website that reveals its content with JavaScript?

阅读更多关于 How can I use Perl to scrape a website that reveals its content with JavaScript?

问题 I need to write a Perl script to scrape a website. The website can only be scraped with JavaScript, and the user is on Windows. I got some way with Win32::IE::Mechanize on my work machine, which has IE6, but then I moved to my netbook which has IE8, and can't even get as far as fetching a simple page. Is Win32::IE::Mechanize up to date with the latest versions of IE? But, more to the point, given a recent WinXP machine, what's the quickest, easiest way to scrape a site which only reveals its

Python mechanize login to website

阅读更多关于 Python mechanize login to website

问题 I'm trying to log into a website using Python and Mechanize, however, I'm running into trouble when trying to get the POST data to behave as I want. Essentially I want to replicate this using mechanize and Python: wget --quiet --save-cookies cookiejar --keep-session-cookies --post-data "action=login&login_nick=USERNAME&login_pwd=PASSWORD" -O outfile.htm http://domain.com/index.php The form looks like this: <login POST http://domain.com/index.php application/x-www-form-urlencoded <TextControl

Cannot Login to Amazon with Ruby Mechanize

阅读更多关于 Cannot Login to Amazon with Ruby Mechanize

问题 I am attempting to login to Amazon using the Ruby gem Mechanize. I always get kicked back to the sign in page without any sort of error message. I am wondering if this is a bug with Mechanize or if Amazon blocks this sort of access. I have code below that you can irb to test. @mechanizer = Mechanize.new @mechanizer.user_agent_alias = 'Mac Safari' @page = @mechanizer.get("https://www.amazon.com/ap/signin?_encoding=UTF8&openid.assoc_handle=usflex&openid.return_to=https%3A%2F%2Fwww.amazon.com

Python's mechanize proxy support

阅读更多关于 Python's mechanize proxy support

问题 I have a question about python mechanize's proxy support. I'm making some web client script, and I would like to insert proxy support function into my script. For example, if I have: params = urllib.urlencode({'id':id, 'passwd':pw}) rq = mechanize.Request('http://www.example.com', params) rs = mechanize.urlopen(rq) How can I add proxy support into my mechanize script? Whenever I open this www.example.com website, i would like it to go through the proxy. 回答1: You use mechanize.Request.set

Python mechanize, following link by url and what is the nr parameter?

阅读更多关于 Python mechanize, following link by url and what is the nr parameter?

问题 I'm sorry to have to ask something like this but python's mechanize documentation seems to really be lacking and I can't figure this out.. they only give one example that I can find for following a link: response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1) But I don't want to use a regex, I just want to follow a link based on its url, how would I do this.. also what is "nr" that is used sometimes for following links? Thanks for any info 回答1: br.follow_link takes either a Link object

Using WWW:Mechanize to download a file to disk without loading it all in memory first

阅读更多关于 Using WWW:Mechanize to download a file to disk without loading it all in memory first

I'm using Mechanize to facilitate the downloading of some files. At the moment my script uses the following line to actually download the files... agent.get('http://example.com/foo').save_as 'a_file_name' However this downloads the complete file into memory before dumping it to disk. How do you bypass this behavior, and simply download straight to disk? If I need to use something other than WWW:Mechanize then how would I go about using WWW:Mechanize's cookies with it? What you really want is the Mechanize::Download http://mechanize.rubyforge.org/Mechanize/Download.html you can use this way:

Using Python and Mechanize to submit form data and authenticate

阅读更多关于 Using Python and Mechanize to submit form data and authenticate

I want to submit login to the website Reddit.com, navigate to a particular area of the page, and submit a comment. I don't see what's wrong with this code, but it is not working in that no change is reflected on the Reddit site. import mechanize import cookielib def main(): #Browser br = mechanize.Browser() # Cookie Jar cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) # Browser options br.set_handle_equiv(True) br.set_handle_gzip(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) # Follows refresh 0 but not hangs on refresh > 0 br.set_handle_refresh

Submitting Forms with Mechanize (Python)

阅读更多关于 Submitting Forms with Mechanize (Python)

问题 Well, I am trying to login to a site using Python and mechanize. I've got the site opened: site = br.open("http://example.com/login.php") And I've got a list of the forms (with br.forms). <GET http://example.com/search.php application/x-www-form-urlencoded <HiddenControl(search=1) (readonly)> ... <POST http://example.com/login.php application/x-www-form-urlencoded <TextControl(username=)> <PasswordControl(password=)> <CheckboxControl(stay=[1])> <SubmitControl(<None>=Log in) (readonly)>> I've

Is it possible to hook up a more robust HTML parser to Python mechanize?

阅读更多关于 Is it possible to hook up a more robust HTML parser to Python mechanize?

问题 I am trying to parse and submit a form on a website using mechanize, but it appears that the built-in form parser cannot detect the form and its elements. I suspect that it is choking on poorly formed HTML, and I'd like to try pre-parsing it with a parser better designed to handle bad HTML (say lxml or BeautifulSoup) and then feeding the prettified, cleaned-up output to the form parser. I need mechanize not only for submitting the form but also for maintaining sessions (I'm working this form