问题
My intention is to submit a search query to a website using Mechanize and to analyse the results using BeautifulSoup. This will be used for the same website and so form names etc. can be hardcoded. I was having issues with my initial query, which is shown below:
import mechanize import urllib2 #from bs4 import BeautifulSoup def inspect_page(url): br = mechanize.Browser(factory=mechanize.RobustFactory()) br.set_handle_robots(False) br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6')] br.set_handle_redirect(mechanize.HTTPRedirectHandler) try: br.open(url) except mechanize.HTTPError, e: print "HTTP Error", e.code, except urllib2.URLError as e: print "URL Error", e.reason, return for form in br.forms(): print form br.select_form(name="dataform") br.form['pcode'] = 'WV14 8EW' br.form['premise'] = '66' response = br.submit() print response.read() #soup = BeautifulSoup(response.read()) inspect_page('http://www.fensa.co.uk/asp/certificate.asp')
This did not redirect to the results page and print response.read()
displayed the HTML of the page I submitted the query on, so I assumed I had made an error in my code. However when I tested another site (inspect_page('https://publicaccess.glasgow.gov.uk/online-applications/search.do?action=simple')
) and changed the forms to match those on the site:
`br.select_form(name="searchCriteriaForm")
br.form['searchCriteria.simpleSearchString'] = 'Queen Elizabeth Gardens'
response = br.submit()
print response.read()`
I was redirected as I expected. Is there anything that would stop a page being redirected when br.submit()
is called? I've already checked that the site is not GZipped.
回答1:
One limitation is that mechanize
doesn't know about JavaScript. Submitting the search form on the site in your script triggers a JavaScript function which validates the input and changes the action
attribute of the <form>
before actually submitting the form values.
Here is the HTML part of the form:
<a onclick="return validate_required()" name="submit" href="#">
<input class="button" type="button" value="Search" name="Submit2">
</a>
And this is the validate_required()
function defined near the beginning of that HTML document:
function validate_required() {
error = "";
if (document.getElementById("pcode").value == '') { error = error + "Postcode\n"; }
if (document.getElementById("premise").value == '') { error = error + "Premise\n"; }
if (error != '') {
alert("Please enter:\n\n" + error);
return false;
}
else {
document.dataform.action = "certificate_results.asp";
document.dataform.submit();
}
}
回答2:
The form action is only changed on the page when the form inputs are validated through the JavaScript, so I now submit the fields directly to that URL.
`params = {'pcode': "WV14 8EW", 'premise': "66"}
data = urllib.urlencode(params)
request = mechanize.Request(certificate_results.asp)
response = mechanize.urlopen(request, data=data)`
Thanks @BlackJack for the tips
来源:https://stackoverflow.com/questions/30642462/mechanize-br-submit-limitations