Scraping a site that requires login username and password on two separate pages

房东的猫 提交于 2019-12-04 12:02:21
rleigh

I was able to get logged in with the following example. Thanks to everyone that helped me with all the resources and examples to learn from!

require 'nokogiri'
require 'mechanize'

agent = Mechanize.new

# Below opens URL requesting username and finds first field and fills in form then submits page.

login = agent.get('http://www.website_here.com')
login_form = login.forms.first
username_field = login_form.field_with(:name => "user_session[username]")
username_field = "YOUR USERNAME HERE"
page = agent.submit login_form

# Below opens URL requesting password and finds first field and fills in form then submits page.

login = agent.get('http://www.website_here.com')
login_form = login.forms.first
password_field = login_form.field_with(:name => "user_session[password]")
password_field = "YOUR PASSWORD HERE"
page = agent.submit login_form

# Below will print page showing information confirming that you have logged in.

pp page

I found the following example from user:Senthess HERE. I'm still not 100% on what all the individual code is doing so if anyone would like to take the time and break it down, please do so. This will help myself and others to better understand.

Thanks!

I just looked up about Mechanize gem and found a relevant solution. You must set a proper 'name' on input fields. Otherwise you can't accept values from them. Follow this article.

http://crabonature.pl/posts/23-automation-with-mechanize-and-ruby

Not sure if you found these, but Mechanize has fairly excellent docs: http://docs.seattlerb.org/mechanize/GUIDE_rdoc.html

From these, I played around in the irb REPL to create this simple scraper that logs into GitHub: https://gist.github.com/tylermauthe/781f68add24819e207c4

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!