mechanize | 易学教程

ControlNotFoundError (ASP, Mechanize, Javascript, Python)

阅读更多关于 ControlNotFoundError (ASP, Mechanize, Javascript, Python)

问题 I am receiving following response from the server ctrlDateTime%24txtSpecifyFromDate=05%2F02%2F2015& ctrlDateTime%24rgApplicable=rdoApplicableFor& ctrlDateTime%24txtSpecifyToDate=05%2F02%2F2015& I am trying with br["ctrlDateTime%24txtSpecifyFromDate"]="05%2F02%2F2015"; br["ctrlDateTime%24rgApplicable"]="rdoApplicableFor"; br["ctrlDateTime%24txtSpecifyToDate"]="05%2F02%2F2015"; How can I fix ControlnotfoundError? Here is my code: Any idea how to solve it? import mechanize import re br =

mechanize: first form works, then “unknown GET form encoding type 'utf-8'”

阅读更多关于 mechanize: first form works, then “unknown GET form encoding type 'utf-8'”

问题 I am trying to fill out 2 forms from the EUR-Lex website in order to record some data from the generated webpage. I am stuck at form #2. I get the feeling this should be easy and I've researched a bit, but no luck. import mechanize froot = '...' f = open(froot + 'text.html', 'w') br = mechanize.Browser() br.open('http://eur-lex.europa.eu/RECH_legislation.do') br.select_form(name='form2') br['T1'] = ['V112'] br['T3'] = ['V2'] br['T2'] = ['V1'] first_page = br.submit() f.write(first_page.get

Ruby Mechanize screen scraping help

阅读更多关于 Ruby Mechanize screen scraping help

问题 I am trying to scrape a row in a table with a date. I want to scrape only the third row that have the date today. This is my mechanize code. I am trying to select the colum row witch have the date today and its and its columns: agent.page.search("//td").map(&:text).map(&:strip) Output: "11-02-2011", "1", "1", "1", "1", "0", "0,00 DKK", "0,00", "0,00 DKK", "12-02-2011", "5", "5", "1", "4", "0", "0,00 DKK", "0,00", "0,00 DKK", "14-02-2011", "1", "3", "1", "1", "0", "0,00 DKK", ",00", "0,00 DKK"

Can't use perl WWW::Mechanize to tick checkboxes

阅读更多关于 Can't use perl WWW::Mechanize to tick checkboxes

问题 I am making a webscraper using perl WWW::Mechanize. My problem is the site that I am scraping is using javascript a bit too much. I am logging in using credentials, Then traversing to custom search using $mech->follow_link(url) . The problem starts from here. I am landing on to a page where I have to select one checkbox and one radiobutton from a javascript enabled dropdownlist. I am stuck at this point. The part of html is as below. When I am using $mech->tick('cs-MajorIndustryGroup') , I am

how to parse a row only if one of its fields is bold? Nokogiri and Ruby

阅读更多关于 how to parse a row only if one of its fields is bold? Nokogiri and Ruby

问题 so I have this code that collects all product info I need: # get main page page = agent.get "http://www.site.com.mx/tienda/index.php" search_form = page.forms.first search_result = agent.submit search_form doc = Nokogiri::HTML(search_result.body) rows = doc.css("table.articulos tr") i = 0 details = rows.collect do |row| detail = {} [ [:sku, 'td[3]/text()'], [:desc, 'td[4]/text()'], [:qty, 'td[5]/text()'], [:qty2, 'td[5]/p/b/text()'], [:price, 'td[6]/text()'] ].collect do |name, xpath| detail

Is it possible to find the <td> .. </td> text, when any of the <td>..</td> value is known?

阅读更多关于 Is it possible to find the .. text, when any of the .. value is known?

问题 I have an webpage which has the similar kind of html format as below: <form name="test"> <td> .... </td> . . . <td> <A HREF="http://www.edu/st/file.html">alo</A> </td> <td> <A HREF="http://www.dom/st/file.html">foo</A> </td> <td> bla bla </td> </form> Now, I know only the value bla bla , base on the value can we track or find the 3rd last .. value(which is here alo )? I can track those,with the help of HREF values,but the HREF values are not fixed always, they can be anything anytime. 回答1:

Mechanize select from dropdown

阅读更多关于 Mechanize select from dropdown

问题 I want to mechanize to check if the current value of selected dropdown = the default value, then mechanize will choose another value in the list instead. The html of the dropdown is as follow: <td class="label">List</td> <td> <select name="list" id="list" onchange="list()"> <option>---</option> <option value='1'>1</option> <option value='2'>2</option> ---other options--- My code is: if br.form["list"] == "---": br.form["list"].value = "1" r = br.form["list"] print(r) However list value still

Why does this JSON file get filled with 1747 times the last Hash data?

阅读更多关于 Why does this JSON file get filled with 1747 times the last Hash data?

问题 I'm using the following code to generate a JSON file containing all category information for a particular website. require 'mechanize' @hashes = [] @categories_hash = {} @categories_hash['category'] ||= {} @categories_hash['category']['id'] ||= {} @categories_hash['category']['name'] ||= {} @categories_hash['category']['group'] ||= {} # Initialize Mechanize object a = Mechanize.new # Begin scraping a.get('http://www.marktplaats.nl/') do |page| groups = page.search('//*[(@id = "navigation

Python submit post data using mechanize

阅读更多关于 Python submit post data using mechanize

问题 The url that i have to submit to the server looks like this: www.mysite.com/manager.php?checkbox%5B%5D=5&checkbox%5B%5D=4&checkbox%5B%5D=57&self=19&submit=Go%21 The post data I put it like this: data = {'checkbox%5B%5D': '4', ....and so on... 'self': '19', 'submit': 'Go%21'} I encode it: data = urllib.urlencode(orbs) and this is how i run it: resp = mechanize.Request('http://mysite.com/manager.php', data) cj.add_cookie_header(resp) res = mechanize.urlopen(resp) print res.read() And the error

Proxy seems to be ignored by Mechanize?

阅读更多关于 Proxy seems to be ignored by Mechanize?

问题 I am using an http proxy and the Mechanize module. I initialize the mechanize object and set the proxy like so: self.br = mechanize.Browser() self.br.set_proxies({"http": proxyAddress}) #proxy address is like 1.1.1.1:8080 Then I open the site like so: response = self.br.open("http://google.com") My problem is that mechanize seems to be completely ignoring the proxy. If I debug and inspect the br object, under the proxy handler I can see my proxy settings. However, even if I give a bad proxy