htmlunit

Groovy htmlunit getFirstByXPath returning null + OCR Question

不想你离开。 提交于 2019-12-08 06:54:49
问题 I have had a few issues with HtmlUnit returning nulls lately and am looking for guidance. each of my results for grabbing the first row of a website have returned null. I am wondering if someone can A) explain why they might be returning null B) explain better ways (if there are some) to go about getting the information Here is my current code (URL is in the source): client = new WebClient(BrowserVersion.FIREFOX_3) client.javaScriptEnabled = false def url = "http://www.hidemyass.com/proxy

What Exception is thrown on timeout?

妖精的绣舞 提交于 2019-12-08 05:18:29
What Exception is thrown on connection timeout in HTMLUnit ? HtmlUnit uses the Apache HttpClient. The timeout mechanism throws an InterruptedIOException . See the HttpClient documentation . This exception is a subclass of IOException, which can be thrown during any HttpClient execute call (basically whenever you get a page with an HtmlUnit WebClient. metoo I think there is a bug, it really should throw a exception but dont throw if you set an timeout great than a value, you can see it in ( Call getPage from htmlunit WebClient with JavaScript disabled and setTimeout set to 10000 waits forever )

“Run” HTMLUnit with PHP

只愿长相守 提交于 2019-12-08 05:06:21
问题 So I have installed Java on my CentOS server. I now want to be able to use PHP to run HTMLUnit to get a fully rendered webpage and then return the results to the user. I see the "simple" example on HTMLUnit but I know next to nothing about Java and don't know where that needs to go or be ran to even get the test case working (i.e. getting Google's homepage). public void getURL() throws Exception { final WebClient webClient = new WebClient(); final HtmlPage page = webClient.getPage("http:/

How to enable Flash to HTMLUnit?

耗尽温柔 提交于 2019-12-08 04:11:21
问题 I'm trying to grab html contents by HTMLUnit. Everything went nice, but couldn't get Flash contents those are visible as <img> where its actually in <object> , i have webClient.getOptions().setJavaScriptEnabled(true); webClient.getOptions().setActiveXNative(true); webClient.getOptions().setAppletEnabled(true); webClient.getOptions().setCssEnabled(true); In SO some places i found someone saying HTMLUnit won't support Flash, but those answers seems old, so am raising this question. Someone

Accessing webpage with Cloudflare protection

有些话、适合烂在心里 提交于 2019-12-08 02:45:30
问题 First of I wanted to apologize in case my question may not be provided with enough connect or anything of that matter, I'm typing this up on my phone right now. So I'm working on a project that requires me to automate tasks within a webpage and in order to do that, step one is to access the page in the first place, but I've reached an obstacle that I've tried searching and figuring out with no avail. The webpage I'm trying to reach had DDoS protection by CloudFlare, meaning before entering

404 Not Found when using HtmlUnit

≡放荡痞女 提交于 2019-12-08 01:14:15
问题 I have the following code: WebClient webClient = new WebClient(); HtmlPage page = webClient.getPage("http://www.myland.co.il/%D7%9E%D7%97%D7%A9%D7%91-%D7%94%D7%A9%D7%A7%D7%99%D7%94"); The code fails with com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 404 Not Found for http://www.myland.co.il/Scripts/swfobject_modified.js I do see in the console output the HTML page I am interested in. Is there a way to supress the exception and get an Html page after all? The page does load

HtmlUnit click() on radio button input not working as expected

我与影子孤独终老i 提交于 2019-12-08 00:55:57
问题 I'm trying to fetch data from this webpage: http://www.atm-mi.it/en/Giromilano/Pages/default.aspx. Basically I'm using HtmlUnit in Java to interact with the "Route and timetable finder" in the middle of the left column, looping through each option in the select, clicking on "Find" and gathering the data I need from the resulting pages. I've had no problem extracting data for urban routes, but can't seem to handle the radio buttons above: clicking on "Underground" in a browser, for example,

HtmlUnit 2.9 jar execute JavaScript

房东的猫 提交于 2019-12-07 22:56:34
问题 I am trying this code: import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.JavaScriptPage; import com.gargoylesoftware.htmlunit.ScriptResult; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.WebRequest; import com.gargoylesoftware.htmlunit.WebResponse; import com.gargoylesoftware.htmlunit.WebWindow; import com.gargoylesoftware.htmlunit.html.HtmlPage; import com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine;

htmlunit : An invalid or illegal selector was specified

无人久伴 提交于 2019-12-07 18:16:40
问题 I am trying to simulate the login with htmlunit. Although I wrote my code according to the examples, I have encountered a boring problem. Below are some message I have picked up from the console. runtimeError: message=[An invalid or illegal selector was specified (selector: '*,:x' error: Invalid selector: *:x).] sourceName=[http://user.mofangge.com/Scripts/inc/jquery-1.10.2.js] line=[1640] lineSource=[null] lineOffset=[0] WARNING: Obsolete content type encountered: 'application/x-javascript'.

Servlet Filter is Returning “Proxy Error” on AWS

你离开我真会死。 提交于 2019-12-07 17:58:37
问题 I have set up a Filter to add crawler support for my GWT web application. The idea is to catch all requests that contain " _escaped_fragment_= " and supply a snapshot for the crawler. I have set up the Filter using Guice as follows: filter("/*").through(CrawlerFilter.class); The following is the code for the CrawlerFilter class (many thanks to Patrick): @Singleton public class CrawlerFilter implements Filter { private static final Logger logger = Logger.getLogger(CrawlerFilter.class.getName()