mechanize

How do I configure a Ruby Mechanize agent to work through the Charles web proxy?

久未见 提交于 2019-12-07 06:13:54
问题 I'm writing an "automatically fill in the forms" app using Ruby / Mechanize. It almost works. I can use the wonderful Charles web proxy to see the exchange between the server and my Firefox browser. Now I want to use Charles to see the exchange between the server and my app. Charles proxies on port 8888. Assume that the server is at https://my.host.com. One thing that does NOT work is: @agent ||= Mechanize.new do |agent| agent.set_proxy("my.host.com", 8888) end This results in a Net::HTTP:

How can I get all links of a website using the Mechanize gem?

℡╲_俬逩灬. 提交于 2019-12-07 05:49:39
问题 How can i get all links of a website using ruby Mechanize gem? Does Mechanize can do like Anemone gem: Anemone.crawl("https://www.google.com.vn/") do |anemone| anemone.on_every_page do |page| puts page.url end end I'm newbie in web crawler. Thanks in advance! 回答1: It's quite simple with Mechanize, and I suggest you to read the documentation. You can start with Ruby BastardBook. To get all links from a page with Mechanize try this: require 'mechanize' agent = Mechanize.new page = agent.get(

How can I keep WWW::Mechanize from following redirects?

人盡茶涼 提交于 2019-12-06 19:43:28
问题 I have a Perl script that uses WWW::Mechanize to read from a file and perform some automated tasks on a website. However, the website uses a 302 redirect after every time I request a certain page. I don't want to be redirected (the page that it redirects to takes too long to respond); I just want to loop through the file and call the first link over and over. I can't figure out how to make WWW::Mechanize NOT follow redirects. Any suggestions? 回答1: WWW::Mechanize is a subclass of LWP:

How to scrape aspx pages with python

末鹿安然 提交于 2019-12-06 18:56:27
I am trying to scrape a site, https://www.searchiqs.com/nybro/ (you have to click the "Log In as Guest" to get to the search form. If I search for a Party 1 term like say "Andrew" the results have pagination and also, the request type is POST so the URL does not change and also the sessions time out very quickly. So quickly that if i wait ten minutes and refresh the search url page it gives me a timeout error. I got started with scraping recently, so I have mostly been doing GET posts where I can decipher the URL. So so far I have realized that I will have to look at the DOM. Using Chrome

how do i set a timeout value for python's mechanize?

房东的猫 提交于 2019-12-06 17:39:04
问题 How do i set a timeout value for python's mechanize? 回答1: Alex is correct: mechanize.urlopen takes a timeout argument. Therefore, just insert a number of seconds in floating point: mechanize.urlopen('http://url/', timeout=30.0) . The background, from the source of mechanize.urlopen : def urlopen(url, data=None, timeout=_sockettimeout._GLOBAL_DEFAULT_TIMEOUT): ... return _opener.open(url, data, timeout) What is mechanize._sockettimeout._GLOBAL_DEFAULT_TIMEOUT you ask? It's just the socket

How to deploy this “Python+twill+mechanize” combination to “Google App Engine”?

妖精的绣舞 提交于 2019-12-06 15:45:27
问题 I've been trying to pass my login and password from Python script to the eBay sign-in page. Later I want this script to be run from " Google App Engine " I was suggested to use " mechanize ". Unfortunately, it didn't work for me: IDLE 1.2.4 >>> import re >>> import mechanize >>> br = mechanize.Browser() >>> br.open("https://signin.ebay.com") Traceback (most recent call last): File "<pyshell#3>", line 1, in <module> br.open("https://signin.ebay.com") File "build\bdist.win32\egg\mechanize\

Get Mechanize to handle cookies from an arbitrary POST (to log into a website programmatically)

荒凉一梦 提交于 2019-12-06 14:54:09
I want to log into https://www.t-mobile.com/ programmatically. My first idea was to use Mechanize to submit the login form: alt text http://dl.dropbox.com/u/2792776/screenshots/2010-04-08_1440.png However, it turns out that this isn't even a real form. Instead, when you click "Log in" some javascript grabs the values of the fields, creates a new form dynamically, and submits it. "Log in" button HTML: <button onclick="handleLogin(); return false;" class="btnBlue" id="myTMobile-login"><span>Log in</span></button> The handleLogin() function: function handleLogin() { if (ValidateMsisdnPassword())

python http 组件简介

[亡魂溺海] 提交于 2019-12-06 14:35:46
1. mechanize https://pypi.python.org/pypi/mechanize/ 中文简介: 基于urllib2,完全兼容urllib2,提供浏览历史,表单状态,cookies等功能。 mechanize 0.2.5 Downloads ↓ Stateful programmatic web browsing. Stateful programmatic web browsing, after Andy Lester's Perl module WWW::Mechanize. mechanize.Browser implements the urllib2.OpenerDirector interface. Browser objects have state, including navigation history, HTML form state, cookies, etc. The set of features and URL schemes handled by Browser objects is configurable. The library also provides an API that is mostly compatible with urllib2: your urllib2 program will likely still

Interact with Flash using Python Mechanize

限于喜欢 提交于 2019-12-06 13:18:05
I am trying to create an automated program in Python that deals with Flash. Right now I am using Python Mechanize, which is great for filling forms, but when it comes to flash I don't know what to do. Does anyone know how I can interact with flash forms (set and get variables, click buttons, etc.) via Python mechanize or some other python library? Yuda Prawira Nice question but seems unfortunately mechanize can't be used for flash objects jdi What you probably want to search for is how to control javascript in a page via python, which can control flash that was specifically designed to accept

python mechanize forms() err

亡梦爱人 提交于 2019-12-06 13:16:49
I'm using Python 2.7.6 and mechanize 0.2.5 and I want to log in to 'dining.ut.ac.ir' (I have the username and password)- but when I try to run the below script to get the forms list : import mechanize br = mechanize.Browser() br.set_handle_robots(False) br.addheaders = [('User-agent', 'Firefox')] br.open("http://dining.ut.ac.ir/") br.forms() I get this error: Traceback (most recent call last): File "script.py", line 8, in <module> br.forms() File "/home/arman/workspace/python/mechanize/venv/lib/python2.7/site-packages/mechanize/_mechanize.py", line 420, in forms return self._factory.forms()