mechanize

ControlNotFoundError (ASP, Mechanize, Javascript, Python)

只愿长相守 提交于 2019-12-13 05:52:01
问题 I am receiving following response from the server ctrlDateTime%24txtSpecifyFromDate=05%2F02%2F2015& ctrlDateTime%24rgApplicable=rdoApplicableFor& ctrlDateTime%24txtSpecifyToDate=05%2F02%2F2015& I am trying with br["ctrlDateTime%24txtSpecifyFromDate"]="05%2F02%2F2015"; br["ctrlDateTime%24rgApplicable"]="rdoApplicableFor"; br["ctrlDateTime%24txtSpecifyToDate"]="05%2F02%2F2015"; How can I fix ControlnotfoundError? Here is my code: Any idea how to solve it? import mechanize import re br =

mechanize: first form works, then “unknown GET form encoding type 'utf-8'”

六月ゝ 毕业季﹏ 提交于 2019-12-13 05:15:12
问题 I am trying to fill out 2 forms from the EUR-Lex website in order to record some data from the generated webpage. I am stuck at form #2. I get the feeling this should be easy and I've researched a bit, but no luck. import mechanize froot = '...' f = open(froot + 'text.html', 'w') br = mechanize.Browser() br.open('http://eur-lex.europa.eu/RECH_legislation.do') br.select_form(name='form2') br['T1'] = ['V112'] br['T3'] = ['V2'] br['T2'] = ['V1'] first_page = br.submit() f.write(first_page.get

Ruby Mechanize screen scraping help

烈酒焚心 提交于 2019-12-13 05:13:41
问题 I am trying to scrape a row in a table with a date. I want to scrape only the third row that have the date today. This is my mechanize code. I am trying to select the colum row witch have the date today and its and its columns: agent.page.search("//td").map(&:text).map(&:strip) Output: "11-02-2011", "1", "1", "1", "1", "0", "0,00 DKK", "0,00", "0,00 DKK", "12-02-2011", "5", "5", "1", "4", "0", "0,00 DKK", "0,00", "0,00 DKK", "14-02-2011", "1", "3", "1", "1", "0", "0,00 DKK", ",00", "0,00 DKK"

Can't use perl WWW::Mechanize to tick checkboxes

一曲冷凌霜 提交于 2019-12-13 04:47:38
问题 I am making a webscraper using perl WWW::Mechanize. My problem is the site that I am scraping is using javascript a bit too much. I am logging in using credentials, Then traversing to custom search using $mech->follow_link(url) . The problem starts from here. I am landing on to a page where I have to select one checkbox and one radiobutton from a javascript enabled dropdownlist. I am stuck at this point. The part of html is as below. When I am using $mech->tick('cs-MajorIndustryGroup') , I am

how to parse a row only if one of its fields is bold? Nokogiri and Ruby

拟墨画扇 提交于 2019-12-13 03:45:17
问题 so I have this code that collects all product info I need: # get main page page = agent.get "http://www.site.com.mx/tienda/index.php" search_form = page.forms.first search_result = agent.submit search_form doc = Nokogiri::HTML(search_result.body) rows = doc.css("table.articulos tr") i = 0 details = rows.collect do |row| detail = {} [ [:sku, 'td[3]/text()'], [:desc, 'td[4]/text()'], [:qty, 'td[5]/text()'], [:qty2, 'td[5]/p/b/text()'], [:price, 'td[6]/text()'] ].collect do |name, xpath| detail

Is it possible to find the <td> .. </td> text, when any of the <td>..</td> value is known?

给你一囗甜甜゛ 提交于 2019-12-13 03:39:24
问题 I have an webpage which has the similar kind of html format as below: <form name="test"> <td> .... </td> . . . <td> <A HREF="http://www.edu/st/file.html">alo</A> </td> <td> <A HREF="http://www.dom/st/file.html">foo</A> </td> <td> bla bla </td> </form> Now, I know only the value bla bla , base on the value can we track or find the 3rd last .. value(which is here alo )? I can track those,with the help of HREF values,but the HREF values are not fixed always, they can be anything anytime. 回答1:

Mechanize select from dropdown

旧街凉风 提交于 2019-12-13 02:04:51
问题 I want to mechanize to check if the current value of selected dropdown = the default value, then mechanize will choose another value in the list instead. The html of the dropdown is as follow: <td class="label">List</td> <td> <select name="list" id="list" onchange="list()"> <option>---</option> <option value='1'>1</option> <option value='2'>2</option> ---other options--- My code is: if br.form["list"] == "---": br.form["list"].value = "1" r = br.form["list"] print(r) However list value still

Why does this JSON file get filled with 1747 times the last Hash data?

 ̄綄美尐妖づ 提交于 2019-12-13 01:23:03
问题 I'm using the following code to generate a JSON file containing all category information for a particular website. require 'mechanize' @hashes = [] @categories_hash = {} @categories_hash['category'] ||= {} @categories_hash['category']['id'] ||= {} @categories_hash['category']['name'] ||= {} @categories_hash['category']['group'] ||= {} # Initialize Mechanize object a = Mechanize.new # Begin scraping a.get('http://www.marktplaats.nl/') do |page| groups = page.search('//*[(@id = "navigation

Python submit post data using mechanize

纵饮孤独 提交于 2019-12-12 20:21:11
问题 The url that i have to submit to the server looks like this: www.mysite.com/manager.php?checkbox%5B%5D=5&checkbox%5B%5D=4&checkbox%5B%5D=57&self=19&submit=Go%21 The post data I put it like this: data = {'checkbox%5B%5D': '4', ....and so on... 'self': '19', 'submit': 'Go%21'} I encode it: data = urllib.urlencode(orbs) and this is how i run it: resp = mechanize.Request('http://mysite.com/manager.php', data) cj.add_cookie_header(resp) res = mechanize.urlopen(resp) print res.read() And the error

Proxy seems to be ignored by Mechanize?

微笑、不失礼 提交于 2019-12-12 19:20:20
问题 I am using an http proxy and the Mechanize module. I initialize the mechanize object and set the proxy like so: self.br = mechanize.Browser() self.br.set_proxies({"http": proxyAddress}) #proxy address is like 1.1.1.1:8080 Then I open the site like so: response = self.br.open("http://google.com") My problem is that mechanize seems to be completely ignoring the proxy. If I debug and inspect the br object, under the proxy handler I can see my proxy settings. However, even if I give a bad proxy