mechanize


How to login and crawl a site using Mechanize

不想你离开。 提交于 2020-02-08 05:23:09
问题 I'm trying to use Mechanize to login and crawl a site. For some reason, I can't seem to get the login function to work. Any ideas? This is my code: require 'nokogiri' require 'open-uri' require 'mechanize' a = Mechanize.new a.get('https://jackthreads.com/') form = a.page.form_with(:class => 'jt-form') form.field_with(:name => "email").value = "email" form.field_with(:name => "password21").value = "password" page = a.submit(form, form.buttons.first) 回答1: The action on the form is set to " # ",

Perl Mechanize timeout not working with https

三世轮回 提交于 2020-02-08 02:40:07
问题 I've been using Perl's Mechanize library but for some reason with https the timeout parameter (I'm using Crypt::SSLeay for SSL). my $browser = WWW::Mechanize->new(autocheck=>0, timeout=>3); Has anyone encountered this before and knows how to fix it? Thanks! 回答1: For HTPS/SSL you have to do some workaround: my $html = `wget -q -t 1 -T $timeout -O - $url`; mech->get(0); $mech->update_html($html); 回答2: In just testing it now against https://www.sourceforge.net/, I get the impression that the

python mechanize handle two parameters with same name

自闭症网瘾萝莉.ら 提交于 2020-02-02 07:00:59
问题 I'm logging into a page where they oddly have a form input called login_email and two form inputs called login_password . I need to set the value of both but the straightforward call form['login_password'] throws an error: File "/Library/Python/2.7/site-packages/mechanize/_form.py", line 3101, in find_control return self._find_control(name, type, kind, id, label, predicate, nr) File "/Library/Python/2.7/site-packages/mechanize/_form.py", line 3183, in _find_control raise AmbiguityError("more

MechanicalSoup action difficulty with forms

天大地大妈咪最大 提交于 2020-01-24 21:56:46
问题 First, I am French so if there are mistakes in my english I'm sorry. So here is my problem, I have hard time with mechanicalsoup. So here is my HTML page: <form class="XFYOY" method="post"><h2 class="vvzhL ">Inscrivez-vous pour voir les photos et vidéos de vos amis.</h2> Here are just the first line. I want to create an automatic form but there is not action and I don't know what to put in browser.select_form(): browser.select_form('form[action=/post]') browser["emailOrPhone"] = "0689754327"

Mechanize not being installed by easy_install?

妖精的绣舞 提交于 2020-01-24 12:47:44
问题 I am in the process of migrating from an old Win2K machine to a new and much more powerful Vista 64 bit PC. Most of the migration has gone fairly smoothly - but I did find that I needed to reinstall ALL of my Python related tools. I've downloaded the mechanize-0.1.11.tar.gz file and ran easy_install to install it. This produced C:\Python25\Lib\site-packages\mechanize-0.1.11-py2.5.egg. I then ran a python script to test it, and it worked fine under the interpreter. But, when I ran py2exe to

Checkbox input using python mechanize

倖福魔咒の 提交于 2020-01-21 01:49:18
问题 I want to fill a form using python mechanize. form looks like: <POST https://10.20.254.39/cloud_computing/vmuser/migrate_vm/cli multipart/form-data <TextControl(vm=cli)> <TextControl(chost=10.20.14.39)> <SelectControl(dhost=[*, 28, 27])> <CheckboxControl(live=[on])> <CheckboxControl(undefinesource=[on])> <CheckboxControl(suspend=[on])> <SubmitControl(<None>=Submit) (readonly)> <HiddenControl(_formkey=85819e5a-02bb-42c8-891f-3ddac485438b) (readonly)> <HiddenControl(_formname=migrate_create)

How to parse only part of a string-value from an element using Nokogiri? RUBY, Mechanize

只愿长相守 提交于 2020-01-17 05:17:05
问题 How do I extract numbers off a string ? if xpath is 'td[5]p/@title' HTML : <td valign="top" align="center"> <p title="6 en su sucursal" style="margin-top: 0px; margin-bottom:0px; cursor:hand"> <b>10</b> </p> </td> I need to extract from the title attribute string-value "6 en su sucusal" only number 6 回答1: Give some HTML inside html , you'd do something like this: doc = Nokogiri::HTML(html) numbers = doc.xpath('//p[@title]').collect { |p| p[:title].gsub(/[^\d]/, '') } Then you'll have the

Are cookies kept in a Mechanize browser between opening URLs?

放肆的年华 提交于 2020-01-16 01:20:29
问题 I have code similar to this: br = mechanize.Browser() br.open("https://mysite.com/") br.select_form(nr=0) #do stuff here response = br.submit() html = response.read() #now that i have the login cookie i can do this... br.open("https://mysite.com/") html = response.read() However, my script is responding like it's not logged in for the second request. I checked the first request and yes, it logs in successfully. My question is: do cookies in Mechanize browsers need to be managed or do I need

Too many connection resets Exception Error - Mechanize in Ruby

廉价感情. 提交于 2020-01-15 05:22:09
问题 I'm using Mechanize on Ruby and keep getting this exception error C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:158:in `rescue in rbuf_fill': too many connection resets (due to Net::ReadTimeout - Net::ReadTimeout) after 0 requests on 37920120, last used 1457465950.371121 seconds ago (Net::HTTP::Persistent::Error) from C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:152:in `rbuf_fill' from C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:134:in `readuntil' from C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:144:in

Python Mechanize select_form() - ParseError: OPTION outside of SELECT

会有一股神秘感。 提交于 2020-01-14 10:07:13
问题 I am using Python 2.7 and Mechanize 2.5. I am trying to use the select_form() method, but I am getting the following error: File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 499, in select_form global_form = self._factory.global_form File "C:\Python27\lib\site-packages\mechanize\_html.py", line 544, in __getattr__ self.forms() File "C:\Python27\lib\site-packages\mechanize\_html.py", line 557, in forms self._forms_factory.forms()) File "C:\Python27\lib\site-packages\mechanize\

工具导航Map