mechanize

Mechanize Javascript

喜欢而已 提交于 2019-12-21 21:29:26
问题 I try to submit a form by Mechanize, however, I am not sure how to add necessary form valuables which are done by some Javascript. Since Mechanize does not support Javascript yet, and so I try to add the variables manually. The form source: <form name="aspnetForm" method="post" action="list.aspx" language="javascript" onkeypress="javascript:return WebForm_FireDefaultButton(event, '_ctl0_ContentPlaceHolder1_cmdSearch')" id="aspnetForm"> <input type="hidden" name="__EVENTTARGET" id="_

Mechanize/Ruby read source code of 404 page

ぐ巨炮叔叔 提交于 2019-12-21 20:54:40
问题 All I'm doing is loading mechanize, and getting a page that returns 404. But that's exactly what I want. The 404 page has plenty of html I'd like to use in my example. a = mechanize.new a.get('http://www.youtube.com/watch?v=e4g8jriw4rg') a.page => nil I can't seem to find any further info on this. 回答1: You need to handle the exception: begin page = a.get 'http://www.youtube.com/watch?v=e4g8jriw4rg' rescue Mechanize::ResponseCodeError => e puts e.response_code # the status code as a string, e

Raw HTML vs. DOM scraping in python using mechanize and beautiful soup

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-21 20:29:25
问题 I am attempting to write a program that, as an example, will scrape the top price off of this web page: http://www.kayak.com/#/flights/JFK-PAR/2012-06-01/2012-07-01/1adults First, I am easily able to retrieve the HTML by doing the following: from urllib import urlopen from BeautifulSoup import BeautifulSoup import mechanize webpage = 'http://www.kayak.com/#/flights/JFK-PAR/2012-06-01/2012-07-01/1adults' br = mechanize.Browser() data = br.open(webpage).get_data() soup = BeautifulSoup(data)

Regulating / rate limiting ruby mechanize

主宰稳场 提交于 2019-12-21 17:15:06
问题 I need to regulate how often a Mechanize instance connects with an API (once every 2 seconds, so limit connections to that or more) So this: instance.pre_connect_hooks << Proc.new { sleep 2 } I had thought this would work, and it sort of does BUT now every method in that class sleeps for 2 seconds, as if the mechanize instance is touched and told to hold 2 seconds. I'm going to try a post connect hook, but it is obvious I need something a bit more elaborate, but what I don't know what at this

trying to POST with ruby mechanize

不问归期 提交于 2019-12-21 08:05:19
问题 I've captured the login HTTP headers using firefox plugin LiveHTTPheaders. I've found the following url and variables. POST /login email=myemail%40gmail.com&password=something&remember=1&loginSubmit=Login And here's the code I am running: require 'rubygems' require 'mechanize' browser = Mechanize.new browser.post('http://www.mysite.com/login', [ ["email","myemail%40gmail.com"], ["password","something"], ["remember","1"], ["loginSubmit","Login"], ["url"=>""] ] ) do |page| puts page.body end

CertificateError: hostname doesn't match

放肆的年华 提交于 2019-12-21 04:03:07
问题 I'm using a proxy (behind corporate firewall), to login to an https domain. The SSL handshake doesn't seem to be going well: CertificateError: hostname 'ats.finra.org:443' doesn't match 'ats.finra.org' I'm using Python 2.7.9 - Mechanize and I've gotten past all of the login, password, security questioon screens, but it is getting hung up on the certification. Any help would be amazing. I've tried the monkeywrench found here: Forcing Mechanize to use SSLv3 Doesn't work for my code though. If

How do you view the request headers that mechanize is using?

走远了吗. 提交于 2019-12-21 03:13:08
问题 I am attempting to submit some data to a form programatically. I'm having a small issue whereby the server is "not liking" what I'm sending it. Frustratingly, there is no error messages, or anything that could help diagnose the issue, all it does is spit me back to the same page I started on when I hit br.submit() . When I click the submit button manually in the browser, the resulting page shows a small "success!" message. No such message appears when submitting via the script. Additionally,

mechanize how to get current url

喜欢而已 提交于 2019-12-20 12:28:16
问题 I have this code require 'mechanize' @agent = Mechanize.new page = @agent.get('http://something.com/?page=1') next_page = page.link_with(:href=>/^?page=2/).click As you can see this code should go to the next page. The next_page should have url http://something.com/?page=2 How to get current url for next_page ? 回答1: next_page.uri.to_s See http://www.rubydoc.info/gems/mechanize/Mechanize/Page/Link#uri-instance_method and http://ruby-doc.org/stdlib-2.4.1/libdoc/uri/rdoc/URI.html For testing

Undefined method 'click' for nil:NilClass (Mechanize) [closed]

假装没事ソ 提交于 2019-12-20 06:39:24
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 4 years ago . I am building a script using Mechanize to scrape data from a website. The script is supposed to click on the "Read biography" link and then scrape the biography of the member on the next page. Here is the script in the Rake file: require 'mechanize' require 'date' require 'json' task :testing2 do

Submitting nested form with python mechanize

依然范特西╮ 提交于 2019-12-20 04:56:19
问题 I am trying to submit a login form on a web page that looks something like this. I have also tried submit the nested form as well as submit both forms, same error every time. <form method="post" name="loginform"> <input type='hidden' name='login' value='1'> <form action="#" method="post" id="login"> Username <input type="text" name="username" id="username" /> Password <input type="password" name="password" id="password" /> <input type="submit" value='Login' class="submit" /> here is my the