mechanize | 易学教程

Python unable to retrieve form with urllib or mechanize

阅读更多关于 Python unable to retrieve form with urllib or mechanize

I'm trying to fill out and submit a form using Python, but I'm not able to retrieve the resulting page. I've tried both mechanize and urllib/urllib2 methods to post the form, but both run into problems. The form I'm trying to retrieve is here: http://zrs.leidenuniv.nl/ul/start.php . The page is in Dutch, but this is irrelevant to my problem. It may be noteworthy that the form action redirects to http://zrs.leidenuniv.nl/ul/query.php . First of all, this is the urllib/urllib2 method I've tried: import urllib, urllib2 import socket, cookielib url = 'http://zrs.leidenuniv.nl/ul/start.php' params

How to set custom user-agent for Mechanize in Rails

阅读更多关于 How to set custom user-agent for Mechanize in Rails

问题 I know you have a set of pre-defined aliases you can use by setting agent.user_agent_alias = 'Linux Mozilla' for instance, but what if I want to set my own user agent, as I'm writing a web crawler and want to identify it, for the sites I'm indexing's sake. Just like Googlebot. There seems to be a user_agent method, but I can't seem to find any documentation about it's function. 回答1: You can set the user agent from an alias a = Mechanize.new a.user_agent_alias = 'Mac Safari' Available aliases

Scrape the absolute URL instead of a relative path in python

阅读更多关于 Scrape the absolute URL instead of a relative path in python

I'm trying to get all the href's from a HTML code and store it in a list for future processing such as this: Example URL: www.example-page-xl.com <body> <section> <a href="/helloworld/index.php"> Hello World </a> </section> </body> I'm using the following code to list the href's: import bs4 as bs4 import urllib.request sauce = urllib.request.urlopen('https:www.example-page-xl.com').read() soup = bs.BeautifulSoup(sauce,'lxml') section = soup.section for url in section.find_all('a'): print(url.get('href')) However I would like to store the URL as: www.example-page-xl.com/helloworld/index.php and

How to get Cucumber/Capybara/Mechanize to work against external non-rails site

阅读更多关于 How to get Cucumber/Capybara/Mechanize to work against external non-rails site

问题 I'm trying to do BDD on a Google App Script. I understand that in principle I should be able to use some combination of Cucumber, Capybara and Mechanize to do BDD on a non-rails external site. In this case I am trying to test a Google App Script I created. I've got the complete code so far in this project: https://github.com/tansaku/GoogleAppScriptBDD However I am currently stuck on this error: rack-test requires a rack application, but none was given (ArgumentError) I know that I don't want

mechanize select form using id

阅读更多关于 mechanize select form using id

I am working on mechanize with python. <form action="/monthly-reports" accept-charset="UTF-8" method="post" id="sblock"> The form here does not have a name. How can I parse the form using it's id ? python412524 I found this as a solution for the same problem. br is the mechanize object: formcount=0 for frm in br.forms(): if str(frm.attrs["id"])=="sblock": break formcount=formcount+1 br.select_form(nr=formcount) I'm sure the loop counter method above could be done more pythonic, but this should select the form with attribute id="sblock" . Improving a bit on python412524's example, the

Mechanize for Python 3.x

阅读更多关于 Mechanize for Python 3.x

问题 is there any way how to use Mechanize with Python 3.x? Or is there any substitute which works in Python 3.x? I've been searching for hours, but I didn't find anything :( I'm looking for way how to login to the site with Python, but the site uses javascript. Thanks in advance, Adam. 回答1: lxml.html provides form handling facilities and supports Python 3. 回答2: I'm working on a similar project, but the faq for mechanize explicitly says they don't intend on supporting 3x any time soon. Is there a

Python, mechanize, proper syntax for setting multiple headers?

阅读更多关于 Python, mechanize, proper syntax for setting multiple headers?

I can't seem to find how to do this anywere, I am trying to set multiple headers with python's mechanize module, such as: br.addheaders = [('user-agent', ' Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3')] br.addheaders = [('accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')] But it seems that it only takes the last br.addheaders.. so it only shows the 'accept' header, not the 'user-agent' header, which leads me to believe that each call to 'br.addheaders' overwrites any previous calls to this.. I can't figure the

I get an error in python3 when importing mechanize

阅读更多关于 I get an error in python3 when importing mechanize

I get an error in python3 when importing mechanize. I've just installed mechanize into my virtualenv where python3 is installed. $ which python3 /Users/myname/.virtualenvs/python3/bin/python3 $ pip freeze mechanize==0.2.5 But, when I try to import mechanize in my python code, I get this error. import mechanize --------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-1-6b82e40e2c8e> in <module>() ----> 1 import mechanize /Users/myname/.virtualenvs/python3/lib/python3.3/site-packages/mechanize/__init__.py in <module>

Clicking a button with Ruby Mechanize

阅读更多关于 Clicking a button with Ruby Mechanize

I have a particularly difficult form that I am trying to click the search button and can't seem to do it. Here is the code for the form from the page source: <input type="image" name="" src="http://images.example.com/WOKRS53B4/images/search.gif" align="absmiddle" border="0" onclick="return check_form_inputs('UA_GeneralSearch_input_form','search');" title="Search" alt="Search" class=""> I am trying to do the standard mechanize click action: login_page = agent.click(homepage.link_with(:text => "Search")) Is this because the button uses javascript? If so, any suggestions? It is not a link, it is

Python Auto Fill with Mechanize

阅读更多关于 Python Auto Fill with Mechanize

Could someone help me or share some code to auto fill a login with mechanize ( http://wwwsearch.sourceforge.net/mechanize/ )? I want to make a python script to log me into my favorite sites when I run it. Thanks! This will help you to login to one site and download a page for example: import mechanize br=mechanize.Browser() br.open('http://www.yourfavoritesite.com') br.select_form(nr=0) #check yoursite forms to match the correct number br['Username']='Username' #use the proper input type=text name br['Password']='Password' #use the proper input type=password name br.submit() br.retrieve('https