mechanize | 易学教程

Basic and Form Authentication with Mechanize (Ruby)

阅读更多关于 Basic and Form Authentication with Mechanize (Ruby)

问题 I'm trying to log into a site on the company intranet which has a basic authentication popup dialog box and form based authentication. This is the code I'm using (which results in a 401 => Net::HTTPUnauthorized error): require 'rubygems' require 'mechanize' require 'logger' agent = WWW::Mechanize.new { |a| a.log = Logger.new("mech.log") } agent.user_agent_alias = 'Windows Mozilla' agent.basic_auth('username','password') agent.get('http://example.com') do |page| puts page.title end This

Mechanize and Beautiful soup python

阅读更多关于 Mechanize and Beautiful soup python

问题 I'm trying to submit a form to a site using beautiful soup and mechanize. Mechanize on its own throws an error with nested forms so I tried following the suggestion of using another parser. Here's the code: browser = mechanize.Browser() browser.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] browser.set_handle_robots(False) response = browser.open('URL') soup = BeautifulSoup(response.get_data()) response

Using Python to Automatically Login to a Website with a JavaScript Form

阅读更多关于 Using Python to Automatically Login to a Website with a JavaScript Form

问题 I'm attempting to write a particular script that logs into a website. This specific website contains a Javascript form so I had little to no luck by making use of "mechanize". I'm curious if there exist other solutions that I may be unaware of that would help me in my situation. If this particular question or some related variant has been asked here before, please excuse me, and I would prefer the link to this particular query. Otherwise, what are some common techniques/approaches for dealing

Mechanize br.submit() limitations?

阅读更多关于 Mechanize br.submit() limitations?

问题 My intention is to submit a search query to a website using Mechanize and to analyse the results using BeautifulSoup. This will be used for the same website and so form names etc. can be hardcoded. I was having issues with my initial query, which is shown below: import mechanize import urllib2 #from bs4 import BeautifulSoup def inspect_page(url): br = mechanize.Browser(factory=mechanize.RobustFactory()) br.set_handle_robots(False) br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows; U;

Python mechanize form submitting doesn't work

阅读更多关于 Python mechanize form submitting doesn't work

问题 I am trying to write a simple bot that would login to my account on a page and then comment other users' images. However I am not able to get the comment form submitting work correctly. The comment form looks like this: <form id="comment-form" action="#" onsubmit="postComment($(this).serialize(),'image',117885,229227); return false;"> <input class="comment" type="text" size="40" name="comment" id="comment" /> <input type="hidden" name="commentObj" value="9234785" /> <input type="hidden" name=

How to set the Referer header before loading a page with Ruby mechanize?

阅读更多关于 How to set the Referer header before loading a page with Ruby mechanize?

问题 Is there a straightforward way to set custom headers with Mechanize 2.3? I tried a former solution but get: $agent = Mechanize.new $agent.pre_connect_hooks << lambda { |p| p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main' } # ./mech.rb:30:in `<main>': undefined method `pre_connect_hooks' for nil:NilClass (NoMethodError) 回答1: The docs say: get(uri, parameters = [], referer = nil, headers = {}) { |page| ... } so for example: agent.get 'http://www.google.com/', [], agent.page

How to make mechanize wait for web-page 'full' load?

阅读更多关于 How to make mechanize wait for web-page 'full' load?

问题 I want to scrape some web page which loads its components dynamically. This page has an onload script, and I can see the complete page 3-5 seconds after typing the URL into my browser. The problem is, when I call br.open('URL') , the response is the web page at 0 seconds. There is a difference 3-5 seconds later between the HTML (which I want) and result of br.open('URL') . 回答1: Working a webpage with a rich javascripts content with mechanize is not much easy, but there are ways to get what

How to make a script wait within an iteration until the Internet connection is reestablished?

阅读更多关于 How to make a script wait within an iteration until the Internet connection is reestablished?

问题 I have a scraping code within a for loop, but it would take several hours to complete, and the program stops when my Internet connection breaks. What I (think I) need is a condition at the beginning of the scraper that tells Python to keep trying at that point. I tried to use the answer from here: for w in wordlist: #some text processing, works fine, returns 'textresult' if textresult == '___': #if there's nothing in the offline resources bufferlist = list() str1=str() mlist=list() # I use

Ruby/Mechanize “failed to allocate memory”. Erasing instantiation of 'agent.get' method?

阅读更多关于 Ruby/Mechanize “failed to allocate memory”. Erasing instantiation of 'agent.get' method?

问题 I've got a little problem about leaking memory in a Mechanize Ruby script. I "while loop" multiple web pages access forever and memory increase a lot on each loop. That created a "failed to allocate memory" after minutes and made script exit. In fact, it seems that the agent.get method instantiate and hold the result even if I assign the result to the same "local variable" or even a "global variable". So I tried to assign nil to the variable after last used and before reusing the same name

Mechanize not working for automating gmail login in Google Appengine

阅读更多关于 Mechanize not working for automating gmail login in Google Appengine

问题 I have used mechanize and deployed an app on GAE and it works fine. But, for an app that I am making, I am trying to automate login to gmail through mechanize. It doesn't work in the development environment on local machine as well as after deploying on appengine. I have been able to use the same script to run it on my server through mod_python using PSP. I found a lot of solutions here, but none of them seem to work for me. Here is a snippet of my code: <snip> br = mechanize.Browser()