mechanize | 易学教程

Mechanize Javascript

阅读更多关于 Mechanize Javascript

问题 I try to submit a form by Mechanize, however, I am not sure how to add necessary form valuables which are done by some Javascript. Since Mechanize does not support Javascript yet, and so I try to add the variables manually. The form source: <form name="aspnetForm" method="post" action="list.aspx" language="javascript" onkeypress="javascript:return WebForm_FireDefaultButton(event, '_ctl0_ContentPlaceHolder1_cmdSearch')" id="aspnetForm"> <input type="hidden" name="__EVENTTARGET" id="_

Mechanize/Ruby read source code of 404 page

阅读更多关于 Mechanize/Ruby read source code of 404 page

问题 All I'm doing is loading mechanize, and getting a page that returns 404. But that's exactly what I want. The 404 page has plenty of html I'd like to use in my example. a = mechanize.new a.get('http://www.youtube.com/watch?v=e4g8jriw4rg') a.page => nil I can't seem to find any further info on this. 回答1: You need to handle the exception: begin page = a.get 'http://www.youtube.com/watch?v=e4g8jriw4rg' rescue Mechanize::ResponseCodeError => e puts e.response_code # the status code as a string, e

Raw HTML vs. DOM scraping in python using mechanize and beautiful soup

阅读更多关于 Raw HTML vs. DOM scraping in python using mechanize and beautiful soup

问题 I am attempting to write a program that, as an example, will scrape the top price off of this web page: http://www.kayak.com/#/flights/JFK-PAR/2012-06-01/2012-07-01/1adults First, I am easily able to retrieve the HTML by doing the following: from urllib import urlopen from BeautifulSoup import BeautifulSoup import mechanize webpage = 'http://www.kayak.com/#/flights/JFK-PAR/2012-06-01/2012-07-01/1adults' br = mechanize.Browser() data = br.open(webpage).get_data() soup = BeautifulSoup(data)

Regulating / rate limiting ruby mechanize

阅读更多关于 Regulating / rate limiting ruby mechanize

问题 I need to regulate how often a Mechanize instance connects with an API (once every 2 seconds, so limit connections to that or more) So this: instance.pre_connect_hooks << Proc.new { sleep 2 } I had thought this would work, and it sort of does BUT now every method in that class sleeps for 2 seconds, as if the mechanize instance is touched and told to hold 2 seconds. I'm going to try a post connect hook, but it is obvious I need something a bit more elaborate, but what I don't know what at this

trying to POST with ruby mechanize

阅读更多关于 trying to POST with ruby mechanize

问题 I've captured the login HTTP headers using firefox plugin LiveHTTPheaders. I've found the following url and variables. POST /login email=myemail%40gmail.com&password=something&remember=1&loginSubmit=Login And here's the code I am running: require 'rubygems' require 'mechanize' browser = Mechanize.new browser.post('http://www.mysite.com/login', [ ["email","myemail%40gmail.com"], ["password","something"], ["remember","1"], ["loginSubmit","Login"], ["url"=>""] ] ) do |page| puts page.body end

CertificateError: hostname doesn't match

阅读更多关于 CertificateError: hostname doesn't match

问题 I'm using a proxy (behind corporate firewall), to login to an https domain. The SSL handshake doesn't seem to be going well: CertificateError: hostname 'ats.finra.org:443' doesn't match 'ats.finra.org' I'm using Python 2.7.9 - Mechanize and I've gotten past all of the login, password, security questioon screens, but it is getting hung up on the certification. Any help would be amazing. I've tried the monkeywrench found here: Forcing Mechanize to use SSLv3 Doesn't work for my code though. If

How do you view the request headers that mechanize is using?

阅读更多关于 How do you view the request headers that mechanize is using?

问题 I am attempting to submit some data to a form programatically. I'm having a small issue whereby the server is "not liking" what I'm sending it. Frustratingly, there is no error messages, or anything that could help diagnose the issue, all it does is spit me back to the same page I started on when I hit br.submit() . When I click the submit button manually in the browser, the resulting page shows a small "success!" message. No such message appears when submitting via the script. Additionally,

mechanize how to get current url

阅读更多关于 mechanize how to get current url

问题 I have this code require 'mechanize' @agent = Mechanize.new page = @agent.get('http://something.com/?page=1') next_page = page.link_with(:href=>/^?page=2/).click As you can see this code should go to the next page. The next_page should have url http://something.com/?page=2 How to get current url for next_page ? 回答1: next_page.uri.to_s See http://www.rubydoc.info/gems/mechanize/Mechanize/Page/Link#uri-instance_method and http://ruby-doc.org/stdlib-2.4.1/libdoc/uri/rdoc/URI.html For testing

Undefined method 'click' for nil:NilClass (Mechanize) [closed]

阅读更多关于 Undefined method 'click' for nil:NilClass (Mechanize) [closed]

问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 4 years ago . I am building a script using Mechanize to scrape data from a website. The script is supposed to click on the "Read biography" link and then scrape the biography of the member on the next page. Here is the script in the Rake file: require 'mechanize' require 'date' require 'json' task :testing2 do

Submitting nested form with python mechanize

阅读更多关于 Submitting nested form with python mechanize

问题 I am trying to submit a login form on a web page that looks something like this. I have also tried submit the nested form as well as submit both forms, same error every time. <form method="post" name="loginform"> <input type='hidden' name='login' value='1'> <form action="#" method="post" id="login"> Username <input type="text" name="username" id="username" /> Password <input type="password" name="password" id="password" /> <input type="submit" value='Login' class="submit" /> here is my the