mechanize

How to get generated captcha image using mechanize

北战南征 提交于 2019-12-03 10:39:15
问题 I'm trying to use python and mechanize to send sms from my mobile provider website. The problem is that form has a captcha image. Using mechanize I can get the link to the image, but it's different all the time I access that link. Is there any way to get exact picture from mechanize? 回答1: This is a rough example of how to get the image, note that mechanize uses cookies so any cookies received will be sent to the server with the request for the image (this is probably what you want ). br =

Ruby mechanize post with header

笑着哭i 提交于 2019-12-03 09:40:50
问题 I have page with js that post data via XMLHttpRequest and server side script check for this header, how to send this header? agent = WWW::Mechanize.new { |a| a.user_agent_alias = 'Mac Safari' a.log = Logger.new('./site.log') } agent.post('http://site.com/board.php', { 'act' => '_get_page', "gid" => 1, 'order' => 0, 'page' => 2 } ) do |page| p page end 回答1: I found this post with a web search (two months later, I know) and just wanted to share another solution. You can add custom headers

How to Get the Page Source with Mechanize/Nokogiri

爱⌒轻易说出口 提交于 2019-12-03 09:39:48
I'm logged into a webpage/servlet using Mechanize. I have a page object jobShortListPg = agent.get(addressOfPage) When i use the following puts jobShortListPg I get the "mechanized" version of the page which I don't want e.g. #<Mechanize::Page::Link "Home" "blahICScriptProgramName=WEBLIB_MENU.ISCRIPT3.FieldFormula.IScript_DrillDown&target=main0&Level=0&RL=&navc=3171"> How do I get the html source of the page instead? Use .body puts jobShortListPg.body Use the content method of the page object. jobShortListPg.content 来源: https://stackoverflow.com/questions/6487101/how-to-get-the-page-source

Javascript (and HTML rendering) engine without a GUI for automation?

南楼画角 提交于 2019-12-03 08:53:16
Are there any libraries or frameworks that provide the functionality of a browser, but do not need to actually render physically onto the screen? I want to automate navigation on web pages (Mechanize does this, for example), but I want the full browser experience, including Javascript. Thus, I'd like to have a virtual browser of some sort, that I can use to "click on links" programmatically, have DOM elements and JS scripts render within it, and manipulate these elements. Solution preferably in Python, but I can manage others. PhantomJS and PyPhantomJS are what I use for tasks like these. What

How do you view the request headers that mechanize is using?

你离开我真会死。 提交于 2019-12-03 08:43:49
I am attempting to submit some data to a form programatically. I'm having a small issue whereby the server is "not liking" what I'm sending it. Frustratingly, there is no error messages, or anything that could help diagnose the issue, all it does is spit me back to the same page I started on when I hit br.submit() . When I click the submit button manually in the browser, the resulting page shows a small "success!" message. No such message appears when submitting via the script. Additionally, no changes are actually being posted to the server. It's quite strange, and the first time I've

How do I get Python's Mechanize to POST an ajax request?

痞子三分冷 提交于 2019-12-03 07:53:55
问题 The site I'm trying to spider is using the javascript: request.open("POST", url, true); To pull in extra information over ajax that I need to spider. I've tried various permutations of: r = mechanize.urlopen("https://site.tld/dir/" + url, urllib.urlencode({'none' : 'none'})) to get Mechanize to get the page but it always results in me getting the login HTML again, indicating that something is wrong. Firefox doesn't seem to add any HTTP data to the POST according to Firebug, and I'm adding an

Ruby Mechanize: user agents?

*爱你&永不变心* 提交于 2019-12-03 07:45:47
问题 How many user agents are there in Mechanize? Is there a handy list of all the user agent options anywhere? 回答1: Yes. Look at https://github.com/sparklemotion/mechanize/blob/master/lib/mechanize.rb#L115: AGENT_ALIASES = { 'Windows IE 6' => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)', 'Windows IE 7' => 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)', 'Windows Mozilla' => 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516

I can't remove whitespaces from a string parsed by Nokogiri

若如初见. 提交于 2019-12-03 07:22:39
I can't remove whitespaces from a string. My HTML is: <p class='your-price'> Cena pro Vás: <strong>139 <small>Kč</small></strong> </p> My code is: #encoding: utf-8 require 'rubygems' require 'mechanize' agent = Mechanize.new site = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky") price = site.search("//p[@class='your-price']/strong/text()") val = price.first.text => "139 " val.strip => "139 " val.gsub(" ", "") => "139 " gsub , strip , etc. don't work. Why, and how do I fix this? val.class => String val.dump => "\"139\\u{a0}\"" ! val.encoding => #<Encoding:UTF-8> __ENCODING__ =>

Are there any alternatives to Mechanize in Python? [closed]

隐身守侯 提交于 2019-12-03 06:51:28
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . I'm using Python 3.6 while I have to fill in a form. Unfortunately, mechanize doesn't work on Python 3. What do you suggest as an alternative to mechanize? 回答1: SeleniumRC with selenium.py is an alternative (and one of the few workable options if the pages you need to scrape have an important, "structural" role

Web Crawler - Ignore Robots.txt file?

梦想的初衷 提交于 2019-12-03 06:42:32
Some servers have a robots.txt file in order to stop web crawlers from crawling through their websites. Is there a way to make a web crawler ignore the robots.txt file? I am using Mechanize for python. The documentation for mechanize has this sample code: br = mechanize.Browser() .... # Ignore robots.txt. Do not do this without thought and consideration. br.set_handle_robots(False) That does exactly what you want. This looks like what you need: from mechanize import Browser br = Browser() # Ignore robots.txt br.set_handle_robots( False ) but you know what you're doing… 来源: https:/