mechanize

Python unable to retrieve form with urllib or mechanize

让人想犯罪 __ 提交于 2019-11-29 11:00:02
I'm trying to fill out and submit a form using Python, but I'm not able to retrieve the resulting page. I've tried both mechanize and urllib/urllib2 methods to post the form, but both run into problems. The form I'm trying to retrieve is here: http://zrs.leidenuniv.nl/ul/start.php . The page is in Dutch, but this is irrelevant to my problem. It may be noteworthy that the form action redirects to http://zrs.leidenuniv.nl/ul/query.php . First of all, this is the urllib/urllib2 method I've tried: import urllib, urllib2 import socket, cookielib url = 'http://zrs.leidenuniv.nl/ul/start.php' params

How to set custom user-agent for Mechanize in Rails

旧城冷巷雨未停 提交于 2019-11-29 10:50:59
问题 I know you have a set of pre-defined aliases you can use by setting agent.user_agent_alias = 'Linux Mozilla' for instance, but what if I want to set my own user agent, as I'm writing a web crawler and want to identify it, for the sites I'm indexing's sake. Just like Googlebot. There seems to be a user_agent method, but I can't seem to find any documentation about it's function. 回答1: You can set the user agent from an alias a = Mechanize.new a.user_agent_alias = 'Mac Safari' Available aliases

Scrape the absolute URL instead of a relative path in python

寵の児 提交于 2019-11-29 10:01:42
I'm trying to get all the href's from a HTML code and store it in a list for future processing such as this: Example URL: www.example-page-xl.com <body> <section> <a href="/helloworld/index.php"> Hello World </a> </section> </body> I'm using the following code to list the href's: import bs4 as bs4 import urllib.request sauce = urllib.request.urlopen('https:www.example-page-xl.com').read() soup = bs.BeautifulSoup(sauce,'lxml') section = soup.section for url in section.find_all('a'): print(url.get('href')) However I would like to store the URL as: www.example-page-xl.com/helloworld/index.php and

How to get Cucumber/Capybara/Mechanize to work against external non-rails site

萝らか妹 提交于 2019-11-29 09:15:20
问题 I'm trying to do BDD on a Google App Script. I understand that in principle I should be able to use some combination of Cucumber, Capybara and Mechanize to do BDD on a non-rails external site. In this case I am trying to test a Google App Script I created. I've got the complete code so far in this project: https://github.com/tansaku/GoogleAppScriptBDD However I am currently stuck on this error: rack-test requires a rack application, but none was given (ArgumentError) I know that I don't want

mechanize select form using id

人走茶凉 提交于 2019-11-29 09:06:55
I am working on mechanize with python. <form action="/monthly-reports" accept-charset="UTF-8" method="post" id="sblock"> The form here does not have a name. How can I parse the form using it's id ? python412524 I found this as a solution for the same problem. br is the mechanize object: formcount=0 for frm in br.forms(): if str(frm.attrs["id"])=="sblock": break formcount=formcount+1 br.select_form(nr=formcount) I'm sure the loop counter method above could be done more pythonic, but this should select the form with attribute id="sblock" . Improving a bit on python412524's example, the

Mechanize for Python 3.x

喜你入骨 提交于 2019-11-29 08:38:59
问题 is there any way how to use Mechanize with Python 3.x? Or is there any substitute which works in Python 3.x? I've been searching for hours, but I didn't find anything :( I'm looking for way how to login to the site with Python, but the site uses javascript. Thanks in advance, Adam. 回答1: lxml.html provides form handling facilities and supports Python 3. 回答2: I'm working on a similar project, but the faq for mechanize explicitly says they don't intend on supporting 3x any time soon. Is there a

Python, mechanize, proper syntax for setting multiple headers?

喜你入骨 提交于 2019-11-29 07:02:35
I can't seem to find how to do this anywere, I am trying to set multiple headers with python's mechanize module, such as: br.addheaders = [('user-agent', ' Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3')] br.addheaders = [('accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')] But it seems that it only takes the last br.addheaders.. so it only shows the 'accept' header, not the 'user-agent' header, which leads me to believe that each call to 'br.addheaders' overwrites any previous calls to this.. I can't figure the

I get an error in python3 when importing mechanize

[亡魂溺海] 提交于 2019-11-29 05:33:49
I get an error in python3 when importing mechanize. I've just installed mechanize into my virtualenv where python3 is installed. $ which python3 /Users/myname/.virtualenvs/python3/bin/python3 $ pip freeze mechanize==0.2.5 But, when I try to import mechanize in my python code, I get this error. import mechanize --------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-1-6b82e40e2c8e> in <module>() ----> 1 import mechanize /Users/myname/.virtualenvs/python3/lib/python3.3/site-packages/mechanize/__init__.py in <module>

Clicking a button with Ruby Mechanize

不羁的心 提交于 2019-11-29 03:34:54
I have a particularly difficult form that I am trying to click the search button and can't seem to do it. Here is the code for the form from the page source: <input type="image" name="" src="http://images.example.com/WOKRS53B4/images/search.gif" align="absmiddle" border="0" onclick="return check_form_inputs('UA_GeneralSearch_input_form','search');" title="Search" alt="Search" class=""> I am trying to do the standard mechanize click action: login_page = agent.click(homepage.link_with(:text => "Search")) Is this because the button uses javascript? If so, any suggestions? It is not a link, it is

Python Auto Fill with Mechanize

旧时模样 提交于 2019-11-29 02:43:51
Could someone help me or share some code to auto fill a login with mechanize ( http://wwwsearch.sourceforge.net/mechanize/ )? I want to make a python script to log me into my favorite sites when I run it. Thanks! This will help you to login to one site and download a page for example: import mechanize br=mechanize.Browser() br.open('http://www.yourfavoritesite.com') br.select_form(nr=0) #check yoursite forms to match the correct number br['Username']='Username' #use the proper input type=text name br['Password']='Password' #use the proper input type=password name br.submit() br.retrieve('https