htmlunit

Servlet Filter is Returning “Proxy Error” on AWS

折月煮酒 提交于 2019-12-06 03:25:24
I have set up a Filter to add crawler support for my GWT web application. The idea is to catch all requests that contain " _escaped_fragment_= " and supply a snapshot for the crawler. I have set up the Filter using Guice as follows: filter("/*").through(CrawlerFilter.class); The following is the code for the CrawlerFilter class (many thanks to Patrick ): @Singleton public class CrawlerFilter implements Filter { private static final Logger logger = Logger.getLogger(CrawlerFilter.class.getName()); /** * Special URL token that gets passed from the crawler to the servlet * filter. This token is

htmlunit : An invalid or illegal selector was specified

妖精的绣舞 提交于 2019-12-05 22:13:08
I am trying to simulate the login with htmlunit. Although I wrote my code according to the examples, I have encountered a boring problem. Below are some message I have picked up from the console. runtimeError: message=[An invalid or illegal selector was specified (selector: '*,:x' error: Invalid selector: *:x).] sourceName=[http://user.mofangge.com/Scripts/inc/jquery-1.10.2.js] line=[1640] lineSource=[null] lineOffset=[0] WARNING: Obsolete content type encountered: 'application/x-javascript'. CSS error: 'http://user.mofangge.com/Content/Css/Style1/Main.css' [1:1] Error in style sheet. (Invalid

HTMLunit - Facebook Login

蓝咒 提交于 2019-12-05 19:41:15
final WebClient webClient = new WebClient(); webClient.setJavaScriptEngine(new JavaScriptEngine(webClient)); HtmlPage page1 = null; try { page1 = webClient.getPage("http://www.facebook.com"); } catch (IOException e) { e.printStackTrace(); } final HtmlForm form = (HtmlForm) page1.getElementById("login_form"); final HtmlSubmitInput button = (HtmlSubmitInput) form.getInputsByValue("Log In").get(0); final HtmlTextInput textField = (HtmlTextInput) page1.getElementById("email"); textField.setValueAttribute("test@test.com"); final HtmlPasswordInput textField2 = (HtmlPasswordInput) page1

Ajax Crawling on Google App Engine - Does HtmlUnit work?

风格不统一 提交于 2019-12-05 15:27:36
http://code.google.com/web/ajaxcrawling/docs/html-snapshot.html Does HtmlUnit work on AppEngine? If not, are there any other ways to make my GWT app crawlable by search engines? A patch for HtmlUnit to work on GAE is in progress. HtmlUnit's bug tracker issue 2962074 discusses making HtmlUnit work on GAE, and provides a preliminary patch for accomplishing this. it doesn't work on the last GAE version (even after patch applying) to check the post http://groups.google.com/group/google-appengine/browse_thread/thread/28a9f9737b1b26b5 来源: https://stackoverflow.com/questions/3285181/ajax-crawling-on

HtmlUnit not creating HtmlPage object

给你一囗甜甜゛ 提交于 2019-12-05 13:35:04
I'm very new to HtmlUnit and I'm trying to scrape a website that uses Javascript to edit the code. I heard HtmlUnit was the best way to go as it returns the final code using a headless browser. However as you will see I cannot even get past creating a HtmlPage object without getting a huge and impossible to understand exception thrown (at least given my virtually null experience with HtmlUnit). Here is my code: import com.gargoylesoftware.htmlunit.*; import com.gargoylesoftware.htmlunit.html.HtmlPage; public class Main { public static void main(String[] args) { Main scraper = new Main();

HtmlUnit Exception

老子叫甜甜 提交于 2019-12-05 12:34:13
I am having trouble understanding the meaning of this HTMLUnit Exception. It happens when I call click() on a link on a webpage. Exception class=[net.sourceforge.htmlunit.corejs.javascript.WrappedException] com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot read property "offsetWidth" from null (http://webapps6.doc.state.nc.us/opi/scripts/DHTMLmessages.js#95) (javascript url#297) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:534) at net.sourceforge.htmlunit.corejs

How can I test context menu functionality in a web app?

早过忘川 提交于 2019-12-05 12:31:54
I'm playing with a grails app that has a contextmenu (on right-click). The context menu is built using Chris Domigan's jquery contextmenu plugin . While the contextmenus do actually work, I want to have automated tests, and I can't work out how to do it. I've tried Selenium 2.05a (ie. Webdriver), but there's no rightClick method. I notice that HtmlUnit has a rightclick method, but I don't seem to be able to detect any difference in the DOM between before the click and after it. Currently there's no right click method in WebDriver, there's an enhancement request opened for it - http://code

Is it possible to use HTTPS proxy in HTMLunit?

心不动则不痛 提交于 2019-12-05 10:16:56
I am new in HTMLunit and trying to set HTTPS proxy for HTMLunit. I tried to use https:// just before the HOST IP, but I got Exception. Anyone can help me to solve this issue? Update: My Code is: WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6,"https://199.127.100.13", 11888); Update 2: I asked the developer team, The said that it is a bug in the framework. They will fix it. You should not be putting http:// or https:// behind the ip address of the proxy server. If your http proxy server supports https then htmlunit would automatically use it. Here is an example of how to use

Can I configure HTMLUnit to only run specific javascript processes and not the whole thing?

我与影子孤独终老i 提交于 2019-12-05 09:44:11
I'm looking to gather information from a set of web pages that are all very similarly formatted. I need some information that is loaded onto the page by Javascript after opening. It seems that HTMLUnit is a pretty common tool to do this, so that's what I'm using. It's unfortunately very slow, which is a complaint I've seen across a lot of forums. The webClient.getPage() command is what is taking forever. When I turn off Javascript, it runs quickly, but I need to execute some Javascript commands. I was wondering, is there a way to selectively execute a few Javascript commands instead of all of

Java – How can I Log into a Website with HtmlUnit?

六月ゝ 毕业季﹏ 提交于 2019-12-05 03:14:02
问题 I am writing a Java program to log into the website my school uses to post grades. This is the url of the login form: https://ma-andover.myfollett.com/aspen/logon.do This is the HTML of the login form: <form name="logonForm" method="post" action="/aspen/logon.do" autocomplete="off"><div><input type="hidden" name="org.apache.struts.taglib.html.TOKEN" value="30883f4c7e25a014d0446b5251aebd9a"></div> <input type="hidden" id="userEvent" name="userEvent" value="930"> <input type="hidden" id=