htmlunit

HtmlUnit forbid external requests

一笑奈何 提交于 2019-11-28 05:34:59
问题 I use HtmlUnit for automated tests for my site. My site use gmaps api - and it takes a lot of time to send request for external site ( I have few hundreds of tests and few thousands of page loads). I need some way to tell HtmlUnit to load only local pages (stored in IIS express), and forbit loading external resources to make my tests running more quickly. 回答1: You can prevent HTMLUnit from accessing certain URL's using as WebConnectionWrapper : browser.setWebConnection(new

java通过HtmlUnit工具和J4L实现带验证码登录

谁都会走 提交于 2019-11-28 05:34:24
1.HtmlUnit 1.1介绍 HtmlUnit是一个用java编写的无界面浏览器,建模html文档,通过API调用页面,填充表单,点击链接等等。如同正常浏览器一样操作。典型应用于测试以及从网页抓取信息。 官方简介翻译: HtmlUnit是一个无界面浏览器Java程序。它为HTML文档建模,提供了调用页面、填写表单、单击链接等操作的API。就跟你在浏览器里做的操作一样。 HtmlUnit不错的JavaScript支持(不断改进),甚至可以使用相当复杂的AJAX库,根据配置的不同模拟Chrome、Firefox或Internet Explorer等浏览器。 HtmlUnit通常用于测试或从web站点检索信息。 1.2使用场景 httpClient的局限性 对于使用java实现的网页爬虫程序,我们一般可以使用apache的HttpClient组件进行HTML页面信息的获取,HttpClient实现的http请求返回的响应一般是纯文本的document页面,即最原始的html页面。 对于一个静态的html页面来说,使用httpClient足够将我们所需要的信息爬取出来了。但是对于现在越来越多的动态网页来说,更多的数据是通过异步JS代码获取并渲染到的,最开始的html页面是不包含这部分数据的。 上图我们所见到的网页,在最初的document加载完成之后,并不会看到红框中的数据列表

HtmlUnitDriver (HtmlUnit) vs GhostDriver (PhantomJS)?

与世无争的帅哥 提交于 2019-11-28 05:18:01
We are in the middle of choosing our headless browser driver solution that will be some implementation of Selenium WebDriver. There is the GhostDriver , which leverages the PhantomJS in the backend on the one side and HtmlUnitDriver which based on HtmlUnit on the other. PhantomJS uses WebKit, the rendering engine of Safari, to render the pages while HtmlUnitDriver uses the Rhino engine which no other browsers use (it's just "simulating" browser behaviour. The last fact considered as a con, because the rendering behavior can differ significantly from the popular browsers. In our opinion,

HtmlUnit button click

杀马特。学长 韩版系。学妹 提交于 2019-11-28 04:49:08
问题 I'm trying to send a message on www.meetme.com but can't figure out how to do it. I can type in the message in the comment area but clicking the Send button doesn't do anything. What am I doing wrong? When I login and press the Login button the page does change and everything is fine. Anyone have any ideas or clues? HtmlPage htmlPage = null; HtmlElement htmlElement; WebClient webClient = null; HtmlButton htmlButton; HtmlForm htmlForm; try{ // Create and initialize WebClient object webClient =

Extremely simple code not working in HtmlUnit

試著忘記壹切 提交于 2019-11-28 00:31:00
I'm working with HtmlUnit 2.9 (the stable version that was released this month). Do you have any idea why the following code is not working? public class Main { public static void main(String[] args) { WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6); webClient.setCssEnabled(true); webClient.setCssErrorHandler(new SilentCssErrorHandler()); webClient.setThrowExceptionOnFailingStatusCode(false); webClient.setThrowExceptionOnScriptError(false); webClient.setRedirectEnabled(false); webClient.setAppletEnabled(false); webClient.setJavaScriptEnabled(false); webClient

Java HtmlUnit - can't login to wordpress

て烟熏妆下的殇ゞ 提交于 2019-11-27 18:34:58
问题 I'm trying to use HtmlUnit to login to my local wordpress website but it seems to have a cookies issue. That's that begining of the code: WebClient webClient = new WebClient(); HtmlPage loginPage = webClient.getPage("http://localhost/flowersWp/wp-admin"); HtmlForm form = loginPage.getFormByName("loginform"); That's what I get in the log. Anyone has an idea? Thanks. Nov 27, 2010 12:43:35 PM org.apache.http.client.protocol.ResponseProcessCookies processCookies WARNING: Cookie rejected: "

Is it possible to ignore JavaScript exceptions when working with WebDriver (HtmlUnit, Ruby bindings)

与世无争的帅哥 提交于 2019-11-27 17:51:58
问题 HtmlUnit throws exception and crash my test when I'm loading the page caps = Selenium::WebDriver::Remote::Capabilities.htmlunit(:javascript_enabled => true) driver = Selenium::WebDriver.for(:remote, :desired_capabilities => caps) driver.navigate.то url ReferenceError: "x" is not defined. (net.sourceforge.htmlunit.corejs.javascript.EcmaError) No exception is thrown if I use a Firefox driver. caps = Selenium::WebDriver::Remote::Capabilities.firefox Or disable JavaScript for HtmlUnit driver caps

How to setup HtmlUnit in an Eclipse project?

百般思念 提交于 2019-11-27 15:07:55
问题 My project includes htmlunit jars and downloads some pages content. Executable jar (which includes libs, funct. of eclipse export) thereof, however, works only on the machine on which I created it (on different it doesn't execute). EDIT: It doesn't execute as it doesn't show "Starting Headless Browser" MessageBox upon startup. I used Eclipse Indigo: File > Export > Runnable jar > package required libratries into generated jar Help, gods: import java.io.*; import com.gargoylesoftware.htmlunit

How do I use the HTMLUnit driver with Selenium from Python?

纵然是瞬间 提交于 2019-11-27 12:27:31
问题 How do I tell Selenium to use HTMLUnit? I'm running selenium-server-standalone-2.0b1.jar as a Selenium server in the background, and the latest Python bindings installed with "pip install -U selenium". Everything works fine with Firefox. But I'd like to use HTMLUnit, as it is lighter weight and doesn't need X. This is my attempt to do so: >>> import selenium >>> s = selenium.selenium("localhost", 4444, "*htmlunit", "http://localhost/") >>> s.start() Traceback (most recent call last): File "

How to combine scrapy and htmlunit to crawl urls with javascript

[亡魂溺海] 提交于 2019-11-27 11:37:02
I'm working on Scrapy to crawl pages,however,I can't handle the pages with javascript. People suggest me to use htmlunit, so I got it installed,but I don't know how to use it at all.Dose anyone can give an example(scrapy + htmlunit) for me? Thanks very much. reclosedev To handle the pages with javascript you can use Webkit or Selenium. Here some snippets from snippets.scrapy.org : Rendered/interactive javascript with gtk/webkit/jswebkit Rendered Javascript Crawler With Scrapy and Selenium RC Here is a working example using selenium and phantomjs headless webdriver in a download handler