mechanize | 易学教程

Use mechanize to log into megaupload

阅读更多关于 Use mechanize to log into megaupload

问题 I am attempting to use the following code to log into megaupload. My question is, how do i that it successfully logged in? I print out the current URL at the end of the code, but when i run the script it just returns www.megaupload.com. import mechanize import cookielib from BeautifulSoup import BeautifulSoup import html2text # Browser br = mechanize.Browser() # Cookie Jar cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) # Browser options br.set_handle_equiv(True) br.set_handle_gzip(True)

Mechanze form submission causes 'Assertion Error' in response when .read() is attempted

阅读更多关于 Mechanze form submission causes 'Assertion Error' in response when .read() is attempted

I am writing a web-crawl program with python and am unable to login using mechanize. The form on the site looks like: <form method="post" action="PATLogon"> <h2 align="center"><img src="/myaladin/images/aladin_logo_rd.gif"></h2>  <input type=hidden name=req value="db"> <input type=hidden name=key value="PROXYAUTH"> <input type=hidden name=url value="http://eebo.chadwyck.com/search"> <input type=hidden name=lib value="8"> <table> <tr><td><b>Last Name:</b></td> <td><input name=LN size=20 maxlength=26></td> <tr><td><b>University ID or Library Barcode:</b></td>

How to read someone else's forum

阅读更多关于 How to read someone else's forum

My friend has a forum, which is full of posts containing information. Sometimes she wants to review the posts in her forum, and come to conclusions. At the moment she reviews posts by clicking through her forum, and generates a not necessarily accurate picture of the data (in her brain) from which she makes conclusions. My thought today was that I could probably bang out a quick Ruby script that would parse the necessary HTML to give her a real idea of what the data is saying. I am using Ruby's net/http library for the first time today, and I have encountered a problem. While my browser has no

Using Ruby with Mechanize to log into a website

阅读更多关于 Using Ruby with Mechanize to log into a website

I need to scrape data from a site, but it requires my login first. I've been using hpricot to successfully scrape other sites, but I'm new to using mechanize, and I'm truly baffled by how to work it. I see this example commonly quoted: require 'rubygems' require 'mechanize' a = Mechanize.new a.get('http://rubyforge.org/') do |page| # Click the login link login_page = a.click(page.link_with(:text => /Log In/)) # Submit the login form my_page = login_page.form_with(:action => '/account/login.php') do |f| f.form_loginname = ARGV[0] f.form_pw = ARGV[1] end.click_button my_page.links.each do |link|

beautifulsoup and mechanize to get ajax call result

阅读更多关于 beautifulsoup and mechanize to get ajax call result

hi im building a scraper using python 2.5 and beautifulsoup but im stuble upon a problem ... part of the web page is generating after user click on some button, whitch start an ajax request by calling specific javacsript function using proper parameters is there a way to simulate user interaction and get this result? i come across a mechanize module but it seems to me that this is mostly used to work with forms ... i would appreciate any links or some code samples thanks ok so i have figured it out ... it was quite simple after i realised that i could use combination of urllib, ulrlib2 and

Recovering from HTTPError in Mechanize

阅读更多关于 Recovering from HTTPError in Mechanize

问题 I am writing a function for some existing python code that will be passed a Mechanize browser object as a parameter. I fill in some details in a form in the browser, and use response = browser.submit() to move the browser to a new page, and collect some information from it. Unfortunately, I occasionally get the following error: httperror_seek_wrapper: HTTP Error 500: Internal Server Error I've navigated to the page in my own browser, and sure enough, I occasionally see this error directly, so

Web Crawler - Ignore Robots.txt file?

阅读更多关于 Web Crawler - Ignore Robots.txt file?

问题 Some servers have a robots.txt file in order to stop web crawlers from crawling through their websites. Is there a way to make a web crawler ignore the robots.txt file? I am using Mechanize for python. 回答1: The documentation for mechanize has this sample code: br = mechanize.Browser() .... # Ignore robots.txt. Do not do this without thought and consideration. br.set_handle_robots(False) That does exactly what you want. 回答2: This looks like what you need: from mechanize import Browser br =

Python Mechanize select form FormNotFoundError

阅读更多关于 Python Mechanize select form FormNotFoundError

I want to select a form with mechanize. This is my code: br = mechanize.Browser() self.br.open(url) br.select_form(name="login_form") The form's code: <form id="login_form" onsubmit="return Index.login_submit();" method="post" action="index.php?action=login&server_list=1"> But I'm getting this Error: mechanize._mechanize.FormNotFoundError: no form matching name 'login_form The problem is that your form does not have a name, only an id, and it is login_form . You can use a predicate: br.select_form(predicate=lambda f: f.attrs.get('id', None) == 'login_form') (where you se if f.attrs has the key

Downloading file with Python mechanize

阅读更多关于 Downloading file with Python mechanize

问题 I am trying to download a file from a website using python and mechanize. My current code successfully logs on to the website and opens the page that contains the download link. The download link is: https://www.lendingclub.com/browse/browseNotesRawDataV2.action The info for the link is: Link(base_url='https://www.lendingclub.com/browse/browse.action', url='/browse/browseNotesRawDataV2.action', text='', tag='a', attrs=[('class', 'master_pngfix'), ('id', 'browseDownloadAllLink'), ('href', '

CertificateError: hostname doesn't match

阅读更多关于 CertificateError: hostname doesn't match

I'm using a proxy (behind corporate firewall), to login to an https domain. The SSL handshake doesn't seem to be going well: CertificateError: hostname 'ats.finra.org:443' doesn't match 'ats.finra.org' I'm using Python 2.7.9 - Mechanize and I've gotten past all of the login, password, security questioon screens, but it is getting hung up on the certification. Any help would be amazing. I've tried the monkeywrench found here: Forcing Mechanize to use SSLv3 Doesn't work for my code though. If you want the code file I'd be happy to send. You can avoid this error by monkey patching ssl: import ssl