htmlunit

How to detect when Selenium loads a browser's error page

三世轮回 提交于 2019-11-29 07:14:21
Is there a universal way to detect when a selenium browser opens an error page? For example, disable your internet connection and do driver.get("http://google.com") In Firefox, Selenium will load the 'Try Again' error page containing text like "Firefox can't establish a connection to the server at www.google.com." Selenium will NOT throw any errors. Is there a browser-independent way to detect these cases? For firefox (python), I can do if "errorPageContainer" in [ elem.get_attribute("id") for elem in driver.find_elements_by_css_selector("body > div") ] But (1) this seems like computational

How to create HtmlUnit HTMLPage object from String?

[亡魂溺海] 提交于 2019-11-29 06:10:55
问题 This question was asked once already, but the API changed I guess and the answers are no valid anymore. URL url = new URL("http://www.example.com"); StringWebResponse response = new StringWebResponse("<html><head><title>Test</title></head><body></body></html>", url); HtmlPage page = HTMLParser.parseHtml(response, new TopLevelWindow("top", new WebClient())); System.out.println(page.getTitleText()); Can't be done because TopLevelWindow is protected and stuff like extending/implementing the

Problem in HtmlUnit API for Java (Headless Browser)?

泪湿孤枕 提交于 2019-11-29 05:18:27
I am using HtmlUnit headless browser to browse this webpage (you can see the webpage to have a better understanding of the problem). I have set the select's value to "1" by the following commands final WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_7); try { // Configuring the webClient webClient.setJavaScriptEnabled(true); webClient.setThrowExceptionOnScriptError(false); webClient.setCssEnabled(true); webClient.setUseInsecureSSL(true); webClient.setRedirectEnabled(true); webClient.setActiveXNative(true); webClient.setAppletEnabled(true); webClient

Java HtmlUnit - can't login to wordpress

谁都会走 提交于 2019-11-29 04:38:42
I'm trying to use HtmlUnit to login to my local wordpress website but it seems to have a cookies issue. That's that begining of the code: WebClient webClient = new WebClient(); HtmlPage loginPage = webClient.getPage("http://localhost/flowersWp/wp-admin"); HtmlForm form = loginPage.getFormByName("loginform"); That's what I get in the log. Anyone has an idea? Thanks. Nov 27, 2010 12:43:35 PM org.apache.http.client.protocol.ResponseProcessCookies processCookies WARNING: Cookie rejected: "[version: 0][name: wordpress_2418eeb845ebfb96f6f1a71ab8c5625a][value: +][domain: localhost][path: /flowersWp

Is it possible to ignore JavaScript exceptions when working with WebDriver (HtmlUnit, Ruby bindings)

落爺英雄遲暮 提交于 2019-11-29 03:51:30
HtmlUnit throws exception and crash my test when I'm loading the page caps = Selenium::WebDriver::Remote::Capabilities.htmlunit(:javascript_enabled => true) driver = Selenium::WebDriver.for(:remote, :desired_capabilities => caps) driver.navigate.то url ReferenceError: "x" is not defined. (net.sourceforge.htmlunit.corejs.javascript.EcmaError) No exception is thrown if I use a Firefox driver. caps = Selenium::WebDriver::Remote::Capabilities.firefox Or disable JavaScript for HtmlUnit driver caps = Selenium::WebDriver::Remote::Capabilities.htmlunit(:javascript_enabled => false) I am unable to

crawl dynamic web page using htmlunit

∥☆過路亽.° 提交于 2019-11-29 02:14:22
I am crawling data using HtmlUnit from a dynamic webpage, which uses infinite scrolling to fetch data dynamically, just like facebook's newsfeed. I used the following sentence to simulate the scrolling down event: webclient.setJavaScriptEnabled(true); webclient.setAjaxController(new NicelyResynchronizingAjaxController()); ScriptResult sr=myHtmlPage.executeJavaScript("window.scrollBy(0,600)"); webclient.waitForBackgroundJavaScript(10000); myHtmlPage=(HtmlPage)sr.getNewPage(); But it seems myHtmlPage stays the same with the previous one, i.e., new data is not appended in myHtmlPage, as a result

How to setup HtmlUnit in an Eclipse project?

筅森魡賤 提交于 2019-11-28 23:55:27
My project includes htmlunit jars and downloads some pages content. Executable jar (which includes libs, funct. of eclipse export) thereof, however, works only on the machine on which I created it (on different it doesn't execute). EDIT: It doesn't execute as it doesn't show "Starting Headless Browser" MessageBox upon startup. I used Eclipse Indigo: File > Export > Runnable jar > package required libratries into generated jar Help, gods: import java.io.*; import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.Page; import com.gargoylesoftware.htmlunit

Screen scraping with Python

雨燕双飞 提交于 2019-11-28 23:37:28
问题 Does Python have screen scraping libraries that offer JavaScript support? I've been using pycurl for simple HTML requests, and Java's HtmlUnit for more complicated requests requiring JavaScript support. Ideally I would like to be able to do everything from Python, but I haven't come across any libraries that would allow me to do it. Do they exist? 回答1: There are many options when dealing with static HTML, which the other responses cover. However if you need JavaScript support and want to stay

How do I use the HTMLUnit driver with Selenium from Python?

≯℡__Kan透↙ 提交于 2019-11-28 19:40:40
How do I tell Selenium to use HTMLUnit? I'm running selenium-server-standalone-2.0b1.jar as a Selenium server in the background, and the latest Python bindings installed with "pip install -U selenium". Everything works fine with Firefox. But I'd like to use HTMLUnit, as it is lighter weight and doesn't need X. This is my attempt to do so: >>> import selenium >>> s = selenium.selenium("localhost", 4444, "*htmlunit", "http://localhost/") >>> s.start() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.6/dist-packages/selenium/selenium/selenium.py"

.net equivalent of htmlunit?

故事扮演 提交于 2019-11-28 17:53:24
Does anybody know if there is a .net equivalent of htmlunit or similar library? I've heard that people have used IKVM to convert the htmlunit library. But I have also heard that the converted code is slow. Requirements: Headless browser Support javascript Handle cookies .Net You can try out the just-released NHtmlUnit (available on NuGet ), which is a .NET-wrapper for HtmlUnit . It's not .NET as in "written in a .NET language and compiled to MSIL", but it's converted to .NET with IKVM and we've written a layer of "purified" C# code on top of it so everything looks and behaves like .NET. You