Mechanize br.submit() limitations?

巧了我就是萌 提交于 2019-12-10 11:54:42

问题


My intention is to submit a search query to a website using Mechanize and to analyse the results using BeautifulSoup. This will be used for the same website and so form names etc. can be hardcoded. I was having issues with my initial query, which is shown below:

import mechanize
import urllib2
#from bs4 import BeautifulSoup


def inspect_page(url):
    br = mechanize.Browser(factory=mechanize.RobustFactory())
    br.set_handle_robots(False)
    br.addheaders = [('User-agent',
                      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6')]
    br.set_handle_redirect(mechanize.HTTPRedirectHandler)

    try:
        br.open(url)
    except mechanize.HTTPError, e:
        print "HTTP Error", e.code,
    except urllib2.URLError as e:
        print "URL Error", e.reason,
        return

    for form in br.forms():
        print form

    br.select_form(name="dataform")
    br.form['pcode'] = 'WV14 8EW'
    br.form['premise'] = '66'
    response = br.submit()
    print response.read()

    #soup = BeautifulSoup(response.read())

inspect_page('http://www.fensa.co.uk/asp/certificate.asp')

This did not redirect to the results page and print response.read() displayed the HTML of the page I submitted the query on, so I assumed I had made an error in my code. However when I tested another site (inspect_page('https://publicaccess.glasgow.gov.uk/online-applications/search.do?action=simple')) and changed the forms to match those on the site:

`br.select_form(name="searchCriteriaForm")
br.form['searchCriteria.simpleSearchString'] = 'Queen Elizabeth Gardens'
response = br.submit()
print response.read()`    

I was redirected as I expected. Is there anything that would stop a page being redirected when br.submit() is called? I've already checked that the site is not GZipped.


回答1:


One limitation is that mechanize doesn't know about JavaScript. Submitting the search form on the site in your script triggers a JavaScript function which validates the input and changes the action attribute of the <form> before actually submitting the form values.

Here is the HTML part of the form:

<a onclick="return validate_required()" name="submit" href="#">
  <input class="button" type="button" value="Search" name="Submit2">
</a>

And this is the validate_required() function defined near the beginning of that HTML document:

function validate_required() {

    error = "";
    if (document.getElementById("pcode").value == '') { error = error + "Postcode\n"; }
    if (document.getElementById("premise").value == '') { error = error + "Premise\n"; }

    if (error != '') {
        alert("Please enter:\n\n" + error);
        return false;
    }
    else {
        document.dataform.action = "certificate_results.asp";
        document.dataform.submit();

    }
}



回答2:


The form action is only changed on the page when the form inputs are validated through the JavaScript, so I now submit the fields directly to that URL.

`params = {'pcode': "WV14 8EW", 'premise': "66"}
data = urllib.urlencode(params)
request = mechanize.Request(certificate_results.asp)
response = mechanize.urlopen(request, data=data)`

Thanks @BlackJack for the tips



来源:https://stackoverflow.com/questions/30642462/mechanize-br-submit-limitations

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!