mechanize

Click on a javascript link within python?

不羁岁月 提交于 2019-11-27 11:43:38
I am navigating a site using python's mechanize module and having trouble clicking on a javascript link for next page. I did a bit of reading and people suggested I need python-spidermonkey and DOMforms. I managed to get them installed by I am not sure of the syntax to actually click on the link. I can identify the code on the page as: <a href="javascript:__doPostBack('ctl00$MainContent$gvSearchResults','Page$2')">2</a> Does anyone know how to click on it? or if perhaps there's another tool. Thanks I mainly use HtmlUnit under jython for these use cases. Also I published a simple article on the

How can I use Perl to scrape a website that reveals its content with JavaScript?

為{幸葍}努か 提交于 2019-11-27 07:19:08
问题 I need to write a Perl script to scrape a website. The website can only be scraped with JavaScript, and the user is on Windows. I got some way with Win32::IE::Mechanize on my work machine, which has IE6, but then I moved to my netbook which has IE8, and can't even get as far as fetching a simple page. Is Win32::IE::Mechanize up to date with the latest versions of IE? But, more to the point, given a recent WinXP machine, what's the quickest, easiest way to scrape a site which only reveals its

Python mechanize login to website

蓝咒 提交于 2019-11-27 07:04:07
问题 I'm trying to log into a website using Python and Mechanize, however, I'm running into trouble when trying to get the POST data to behave as I want. Essentially I want to replicate this using mechanize and Python: wget --quiet --save-cookies cookiejar --keep-session-cookies --post-data "action=login&login_nick=USERNAME&login_pwd=PASSWORD" -O outfile.htm http://domain.com/index.php The form looks like this: <login POST http://domain.com/index.php application/x-www-form-urlencoded <TextControl

Cannot Login to Amazon with Ruby Mechanize

旧巷老猫 提交于 2019-11-27 06:06:03
问题 I am attempting to login to Amazon using the Ruby gem Mechanize. I always get kicked back to the sign in page without any sort of error message. I am wondering if this is a bug with Mechanize or if Amazon blocks this sort of access. I have code below that you can irb to test. @mechanizer = Mechanize.new @mechanizer.user_agent_alias = 'Mac Safari' @page = @mechanizer.get("https://www.amazon.com/ap/signin?_encoding=UTF8&openid.assoc_handle=usflex&openid.return_to=https%3A%2F%2Fwww.amazon.com

Python's mechanize proxy support

♀尐吖头ヾ 提交于 2019-11-27 05:41:10
问题 I have a question about python mechanize's proxy support. I'm making some web client script, and I would like to insert proxy support function into my script. For example, if I have: params = urllib.urlencode({'id':id, 'passwd':pw}) rq = mechanize.Request('http://www.example.com', params) rs = mechanize.urlopen(rq) How can I add proxy support into my mechanize script? Whenever I open this www.example.com website, i would like it to go through the proxy. 回答1: You use mechanize.Request.set

Python mechanize, following link by url and what is the nr parameter?

两盒软妹~` 提交于 2019-11-27 05:07:44
问题 I'm sorry to have to ask something like this but python's mechanize documentation seems to really be lacking and I can't figure this out.. they only give one example that I can find for following a link: response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1) But I don't want to use a regex, I just want to follow a link based on its url, how would I do this.. also what is "nr" that is used sometimes for following links? Thanks for any info 回答1: br.follow_link takes either a Link object

Using WWW:Mechanize to download a file to disk without loading it all in memory first

跟風遠走 提交于 2019-11-27 04:38:08
I'm using Mechanize to facilitate the downloading of some files. At the moment my script uses the following line to actually download the files... agent.get('http://example.com/foo').save_as 'a_file_name' However this downloads the complete file into memory before dumping it to disk. How do you bypass this behavior, and simply download straight to disk? If I need to use something other than WWW:Mechanize then how would I go about using WWW:Mechanize's cookies with it? What you really want is the Mechanize::Download http://mechanize.rubyforge.org/Mechanize/Download.html you can use this way:

Using Python and Mechanize to submit form data and authenticate

北战南征 提交于 2019-11-27 03:52:15
I want to submit login to the website Reddit.com, navigate to a particular area of the page, and submit a comment. I don't see what's wrong with this code, but it is not working in that no change is reflected on the Reddit site. import mechanize import cookielib def main(): #Browser br = mechanize.Browser() # Cookie Jar cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) # Browser options br.set_handle_equiv(True) br.set_handle_gzip(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) # Follows refresh 0 but not hangs on refresh > 0 br.set_handle_refresh

Submitting Forms with Mechanize (Python)

痞子三分冷 提交于 2019-11-27 03:28:23
问题 Well, I am trying to login to a site using Python and mechanize. I've got the site opened: site = br.open("http://example.com/login.php") And I've got a list of the forms (with br.forms). <GET http://example.com/search.php application/x-www-form-urlencoded <HiddenControl(search=1) (readonly)> ... <POST http://example.com/login.php application/x-www-form-urlencoded <TextControl(username=)> <PasswordControl(password=)> <CheckboxControl(stay=[1])> <SubmitControl(<None>=Log in) (readonly)>> I've

Is it possible to hook up a more robust HTML parser to Python mechanize?

天大地大妈咪最大 提交于 2019-11-27 01:58:09
问题 I am trying to parse and submit a form on a website using mechanize, but it appears that the built-in form parser cannot detect the form and its elements. I suspect that it is choking on poorly formed HTML, and I'd like to try pre-parsing it with a parser better designed to handle bad HTML (say lxml or BeautifulSoup) and then feeding the prettified, cleaned-up output to the form parser. I need mechanize not only for submitting the form but also for maintaining sessions (I'm working this form