mechanize | 易学教程

Selenium Webdriver vs Mechanize

阅读更多关于 Selenium Webdriver vs Mechanize

问题 I am interested in automating repetitive data entry in some forms for a website I frequent. So far the tools I've looked up that would provide support for this in a headless fashion could be Selenium WebDriver and Mechanize. My question is, is there a fundamental technical difference in using once versus the other? Selenium is mostly used for testing. I've also noticed some folks use it for doing exactly what I'm looking for, and that's automating data entry. Testing becomes a second benefit

Python Mechanize select a form with no name

阅读更多关于 Python Mechanize select a form with no name

I am attempting to have mechanize select a form from a page, but the form in question has no "name" attribute in the html. What should I do? when I try to use br.select_form(name = "") I get errors that no form is declared with that name, and the function requires a name input. There is only one form on the page, is there some other way I can select that form? YOU Try: br.select_form(nr=0) to select the first form In Mechanize source , def select_form(self, name=None, predicate=None, <b>nr=None</b>): """ ... nr, if supplied, is the sequence number of the form (where 0 is the first). """ If you

Python: Clicking a button with urllib or urllib2

阅读更多关于 Python: Clicking a button with urllib or urllib2

I want to click a button with python, the info for the form is automatically filled by the webpage. the HTML code for sending a request to the button is: INPUT type="submit" value="Place a Bid"> How would I go about doing this? Is it possible to click the button with just urllib or urllib2? Or will I need to use something like mechanize or twill? Use the form target and send any input as post data like this: <form target="http://mysite.com/blah.php" method="GET"> ...... ...... ...... <input type="text" name="in1" value="abc"> <INPUT type="submit" value="Place a Bid"> </form> Python: # parse

Maintaining cookies between Mechanize requests

阅读更多关于 Maintaining cookies between Mechanize requests

I'm trying to use the Ruby version of Mechanize to extract my employer's tickets from a ticket management system that we're moving away from that does not supply an API. Problem is, it seems Mechanize isn't keeping the cookies between the post call and the get call shown below: require 'rubygems' require 'nokogiri' require 'mechanize' @agent = Mechanize.new page = @agent.post('http://<url>.com/user_session', { 'authenticity_token' => '<token>', 'user_session[login]' => '<login>', 'user_session[password]' => '<password>', 'user_session[remember_me]' => '0', 'commit' => 'Login' }) page = @agent

Force python mechanize/urllib2 to only use A requests?

阅读更多关于 Force python mechanize/urllib2 to only use A requests?

Here is a related question but I could not figure out how to apply the answer to mechanize/urllib2: how to force python httplib library to use only A requests Basically, given this simple code: #!/usr/bin/python import urllib2 print urllib2.urlopen('http://python.org/').read(100) This results in wireshark saying the following: 0.000000 10.102.0.79 -> 8.8.8.8 DNS Standard query A python.org 0.000023 10.102.0.79 -> 8.8.8.8 DNS Standard query AAAA python.org 0.005369 8.8.8.8 -> 10.102.0.79 DNS Standard query response A 82.94.164.162 5.004494 10.102.0.79 -> 8.8.8.8 DNS Standard query A python.org

WebBrowsing in C# - Libraries, Tools etc. - Anything like Mechanize in Perl? [closed]

阅读更多关于 WebBrowsing in C# - Libraries, Tools etc. - Anything like Mechanize in Perl? [closed]

问题 Looking for something similar to Mechanize for .NET... If you don't know what Mechanize is.. http://search.cpan.org/dist/WWW-Mechanize/ I will maintain a list of suggestions here. Anything for browsing/posting/screen scraping (Other than WebRequest and WebBrowser Control). Parsing HTMLAgilityPack - http://www.codeplex.com/htmlagilitypack Web App Testing WatiN - Web Application Testing Framework (.NET) - http://watin.sourceforge.net/ Selenium - http://seleniumhq.org/ Art of Test Design Canvas

Screen scraping: getting around “HTTP Error 403: request disallowed by robots.txt”

阅读更多关于 Screen scraping: getting around “HTTP Error 403: request disallowed by robots.txt”

Is there a way to get around the following? httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Is the only way around this to contact the site-owner (barnesandnoble.com).. i'm building a site that would bring them more sales, not sure why they would deny access at a certain depth. I'm using mechanize and BeautifulSoup on Python2.6. hoping for a work-around You can try lying about your user agent (e.g., by trying to make believe you're a human being and not a robot) if you want to get in possible legal trouble with Barnes & Noble. Why not instead get in touch with their

Python re - escape coincidental parentheses in regex pattern

阅读更多关于 Python re - escape coincidental parentheses in regex pattern

问题 I am having trouble with the regex in the following code: import mechanize import re br = mechanize.Browser() br.set_handle_robots(False) br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] response = br.open("http://www.gfsc.gg/The-Commission/Pages/Regulated-Entities.aspx?auto_click=1") html = response.read() br.select_form(nr=0) #print br.form br.set_all_readonly(False) next = re.search(r"""<a href=

Python mechanize login to website

阅读更多关于 Python mechanize login to website

I'm trying to log into a website using Python and Mechanize, however, I'm running into trouble when trying to get the POST data to behave as I want. Essentially I want to replicate this using mechanize and Python: wget --quiet --save-cookies cookiejar --keep-session-cookies --post-data "action=login&login_nick=USERNAME&login_pwd=PASSWORD" -O outfile.htm http://domain.com/index.php The form looks like this: <login POST http://domain.com/index.php application/x-www-form-urlencoded <TextControl(login_nick=USERNAME)> <PasswordControl(login_pwd=PASSWORD)> <CheckboxControl(login_auto=[1])>

Ruby SSL error - sslv3 alert unexpected message

阅读更多关于 Ruby SSL error - sslv3 alert unexpected message

I'm trying to connect to the server https://www.xpiron.com/schedule in a ruby script. However, when I try connecting: require 'open-uri' doc = open('https://www.xpiron.com/schedule') I get the following error message: OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A: sslv3 alert unexpected message from /usr/local/lib/ruby/1.9.1/net/http.rb:678:in `connect' from /usr/local/lib/ruby/1.9.1/net/http.rb:678:in `block in connect' from /usr/local/lib/ruby/1.9.1/timeout.rb:44:in `timeout' from /usr/local/lib/ruby/1.9.1/timeout.rb:87:in `timeout' from /usr/local