mechanize

Basic and Form Authentication with Mechanize (Ruby)

↘锁芯ラ 提交于 2019-12-10 13:33:27
问题 I'm trying to log into a site on the company intranet which has a basic authentication popup dialog box and form based authentication. This is the code I'm using (which results in a 401 => Net::HTTPUnauthorized error): require 'rubygems' require 'mechanize' require 'logger' agent = WWW::Mechanize.new { |a| a.log = Logger.new("mech.log") } agent.user_agent_alias = 'Windows Mozilla' agent.basic_auth('username','password') agent.get('http://example.com') do |page| puts page.title end This

Mechanize and Beautiful soup python

你。 提交于 2019-12-10 12:02:01
问题 I'm trying to submit a form to a site using beautiful soup and mechanize. Mechanize on its own throws an error with nested forms so I tried following the suggestion of using another parser. Here's the code: browser = mechanize.Browser() browser.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] browser.set_handle_robots(False) response = browser.open('URL') soup = BeautifulSoup(response.get_data()) response

Using Python to Automatically Login to a Website with a JavaScript Form

家住魔仙堡 提交于 2019-12-10 11:58:32
问题 I'm attempting to write a particular script that logs into a website. This specific website contains a Javascript form so I had little to no luck by making use of "mechanize". I'm curious if there exist other solutions that I may be unaware of that would help me in my situation. If this particular question or some related variant has been asked here before, please excuse me, and I would prefer the link to this particular query. Otherwise, what are some common techniques/approaches for dealing

Mechanize br.submit() limitations?

巧了我就是萌 提交于 2019-12-10 11:54:42
问题 My intention is to submit a search query to a website using Mechanize and to analyse the results using BeautifulSoup. This will be used for the same website and so form names etc. can be hardcoded. I was having issues with my initial query, which is shown below: import mechanize import urllib2 #from bs4 import BeautifulSoup def inspect_page(url): br = mechanize.Browser(factory=mechanize.RobustFactory()) br.set_handle_robots(False) br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows; U;

Python mechanize form submitting doesn't work

我只是一个虾纸丫 提交于 2019-12-10 10:29:34
问题 I am trying to write a simple bot that would login to my account on a page and then comment other users' images. However I am not able to get the comment form submitting work correctly. The comment form looks like this: <form id="comment-form" action="#" onsubmit="postComment($(this).serialize(),'image',117885,229227); return false;"> <input class="comment" type="text" size="40" name="comment" id="comment" /> <input type="hidden" name="commentObj" value="9234785" /> <input type="hidden" name=

How to set the Referer header before loading a page with Ruby mechanize?

孤人 提交于 2019-12-10 10:13:04
问题 Is there a straightforward way to set custom headers with Mechanize 2.3? I tried a former solution but get: $agent = Mechanize.new $agent.pre_connect_hooks << lambda { |p| p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main' } # ./mech.rb:30:in `<main>': undefined method `pre_connect_hooks' for nil:NilClass (NoMethodError) 回答1: The docs say: get(uri, parameters = [], referer = nil, headers = {}) { |page| ... } so for example: agent.get 'http://www.google.com/', [], agent.page

How to make mechanize wait for web-page 'full' load?

筅森魡賤 提交于 2019-12-10 02:19:15
问题 I want to scrape some web page which loads its components dynamically. This page has an onload script, and I can see the complete page 3-5 seconds after typing the URL into my browser. The problem is, when I call br.open('URL') , the response is the web page at 0 seconds. There is a difference 3-5 seconds later between the HTML (which I want) and result of br.open('URL') . 回答1: Working a webpage with a rich javascripts content with mechanize is not much easy, but there are ways to get what

How to make a script wait within an iteration until the Internet connection is reestablished?

孤者浪人 提交于 2019-12-09 23:43:44
问题 I have a scraping code within a for loop, but it would take several hours to complete, and the program stops when my Internet connection breaks. What I (think I) need is a condition at the beginning of the scraper that tells Python to keep trying at that point. I tried to use the answer from here: for w in wordlist: #some text processing, works fine, returns 'textresult' if textresult == '___': #if there's nothing in the offline resources bufferlist = list() str1=str() mlist=list() # I use

Ruby/Mechanize “failed to allocate memory”. Erasing instantiation of 'agent.get' method?

走远了吗. 提交于 2019-12-09 20:06:30
问题 I've got a little problem about leaking memory in a Mechanize Ruby script. I "while loop" multiple web pages access forever and memory increase a lot on each loop. That created a "failed to allocate memory" after minutes and made script exit. In fact, it seems that the agent.get method instantiate and hold the result even if I assign the result to the same "local variable" or even a "global variable". So I tried to assign nil to the variable after last used and before reusing the same name

Mechanize not working for automating gmail login in Google Appengine

别等时光非礼了梦想. 提交于 2019-12-09 18:29:22
问题 I have used mechanize and deployed an app on GAE and it works fine. But, for an app that I am making, I am trying to automate login to gmail through mechanize. It doesn't work in the development environment on local machine as well as after deploying on appengine. I have been able to use the same script to run it on my server through mod_python using PSP. I found a lot of solutions here, but none of them seem to work for me. Here is a snippet of my code: <snip> br = mechanize.Browser()