mechanize

nokogiri + mechanize css selector by text

删除回忆录丶 提交于 2019-12-12 18:44:19
问题 I am new to nokogiri and so far most familiar with CSS selectors, I am trying to parse information from a table, below is a sample of the table and the code I'm using, I'm stuck on the appropriate if statement, as it seems to return the whole contents of the table. Table: <div class="holder"> <div class ="row"> <div class="c1"> <!-- Content I Don't need --> </div> <div class="c2"> <span class="data"> <!-- Content I Don't Need --> <span class="data"> </div> </div> ... <div class="row"> <div

Mechanize how to add to a select list?

早过忘川 提交于 2019-12-12 14:30:19
问题 I just started experimenting with submitting webforms through mechanize. On this webpage there is a list of items to select from, MASTER_MODS . These can be selected in either MODS using a butten add_MODS or in IT_MODS using a button add_IT_MODS (see figure at the bottom). In the form it looks like this (code for form at bottom): <<SNIP>> <SelectControl(MODS=[*--- none selected ---])> <IgnoreControl(add_MODS=<None>)> <SelectControl(MASTER_MODS=[])> <SelectControl(IT_MODS=[*--- none selected -

How to make mechanize not fail with forms on this page?

旧街凉风 提交于 2019-12-12 11:41:07
问题 import mechanize url = 'http://steamcommunity.com' br=mechanize.Browser(factory=mechanize.RobustFactory()) br.open(url) print br.request print br.form for each in br.forms(): print each print The above code results in: Traceback (most recent call last): File "./mech_test.py", line 12, in <module> for each in br.forms(): File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 426, in forms File "build/bdist.linux-i686/egg/mechanize/_html.py", line 559, in forms File "build/bdist.linux

Mechanize submit login form from http to https

假如想象 提交于 2019-12-12 09:36:19
问题 I have a web page containing a login form which loads via HTTP, but it submits the data via HTTPS. I'm using python-mechanize to log into this site, but it seems that the data is submitted via HTTP. My code is looks like this: import mechanize b = mechanize.Browser() b.open('http://site.com') form = b.forms().next() # the login form is unnamed... print form.action # prints "https://login.us.site.com" form['user'] = "guest" form['pass'] = "guest" b.form = form b.submit() When the form is

Python Mechanize keeps giving me 'response_seek_wrapper' when I try to use .open

此生再无相见时 提交于 2019-12-12 09:21:28
问题 I'm not sure what's going on, as the script used to work (before I messed around with my python on my system...) But when I try something along the lines of import mechanize browser = mechanize.Browser() browser.open("http://google.com") I get something like <response_seek_wrapper at 0x10123fd88 whose wrapped object = <closeable_response at 0x101232170 whose fp = <socket._fileobject object at 0x1010bf5f0>>> Does anyone know why this is and what the fix is? thanks! 回答1: it's not an exception,

Using Ruby with Mechanize to log into a website

流过昼夜 提交于 2019-12-12 08:09:42
问题 I need to scrape data from a site, but it requires my login first. I've been using hpricot to successfully scrape other sites, but I'm new to using mechanize, and I'm truly baffled by how to work it. I see this example commonly quoted: require 'rubygems' require 'mechanize' a = Mechanize.new a.get('http://rubyforge.org/') do |page| # Click the login link login_page = a.click(page.link_with(:text => /Log In/)) # Submit the login form my_page = login_page.form_with(:action => '/account/login

How do I combine this Hash to a single JSON object?

自古美人都是妖i 提交于 2019-12-12 04:55:19
问题 I'm using the following code to generate a JSON file containing all category information for a particular website. The goal is to have a JSON file with the following format: [ { "id":"36_17", "name":"Diversen Particulier", "group":"Diversen", "search_attributes":{ "0":"Prijs van/tot", "1":"Groep en Rubriek", "2":"Conditie", } }, { "id":"36_18", "name":"Diversen Zakelijk", "group":"Diversen", "search_attributes":{ "0":"Prijs van/tot", "1":"Groep en Rubriek", "2":"Conditie", } }, { "id":"36_19"

How to get redirect log in Mechanize?

与世无争的帅哥 提交于 2019-12-12 04:48:07
问题 In ruby, if you use mechanize following 301/302 redirects like this require 'mechanize' m = WWW::Mechanize.new m.get('http://google.com') how to get the list of the pages mechanize was redirected through? (Like http://google.com => http://www.google.com => http://google.com.ua) OK, here is the code in mechanize responsible for redirection elsif res_klass <= Net::HTTPRedirection return page unless follow_redirect? log.info("follow redirect to: #{ response['Location'] }") if log from_uri = page

ParseError: nested FORMs

为君一笑 提交于 2019-12-12 04:23:34
问题 Python mechanize gives nested FORMs error for this code: url = 'http://bis.zju.edu.cn/psi/' browse = mechanize.Browser() browse.set_handle_robots(False) browse.open(url) # print [n for n in browse.forms()] # ParseError: nested FORMs browse.select_form(name="form1") # or (nr=0) # ParseError: nested FORMs seq = '>seq1' + '\n' + 'MNANSSAKLGDSA' browse['sequence'] = seq response = browse.submit() Neither this solves: browse = mechanize.Browser(factory=mechanize.RobustFactory()) browse.set_handle

Does GAE accept twill at all?

不羁岁月 提交于 2019-12-12 04:22:11
问题 I have created my GAE application in directory " my_application ". Inside this directory I created a .py file and named it " my_scrypt ". The contents of " my_scrypt " in the beginning were as following: print 'Content-Type: text/plain' print '' print 'This is my first application' Then I ran it locally on my machine ( Windows XP ) in the installed browser ( Mozilla FireFox ) with " GAE Launcher " - everything was fine - I could see that sentence (" This is my first application ") on the