htmlunit

Fetch Page source using HtmlUnit : URL got stuck

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-19 04:14:52
问题 I am trying to get page source of following URL using Html-Unit get method. http://denydesigns.com/collections/barbara-sherman-fleece-throw-blanket/products/barbara-sherman-antique-fleece-throw-blanket It is getting stuck somewhere. I am trying to find out the reason but I am not getting it. I also tried to see if the Thread created by HtmlUnit is BLOCKED ar WAITING, but this is also not the case. Following is my log generated by HTML Unit. 18 Jan 2013 04:14:47,832 - main - ERROR - com

struggling to click on link within htmlunit

百般思念 提交于 2019-12-19 03:56:09
问题 I am having a problem clicking on a link within htmlunit. I went through the api on the site(which I didn't really understand well) and looked at all the sample code I could find and am still having a problem with clicking on links. Here's the top of the error messsage(its pretty large, if you want I can submit it all) "page2 = link2.click() Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException] com.gargoylesoftware.htmlunit.ScriptException: Sys

How to save HtmlUnit cookies to a file?

梦想与她 提交于 2019-12-18 12:24:48
问题 I'd like to save HtmlUnit cookies to a file and on next run load them from that one. How can I do that? Thanks. 回答1: public static void main(String[] args) throws Exception { LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog"); File file = new File("cookie.file"); ObjectInputStream in = new ObjectInputStream(new FileInputStream(file)); Set<Cookie> cookies = (Set<Cookie>) in.readObject(); in.close(); WebClient wc = new WebClient();

Making AJAX Applications Crawlable? How to build a simple web service on Google App Engine to produce HTML Snapshots?

 ̄綄美尐妖づ 提交于 2019-12-18 12:18:24
问题 Real World Problem: I have my app hosted on Heroku, who (to my knowledge) are unable to offer a solution for running a Headless (GUI-less) Browser - such as HTMLUnit - for generating HTML Snapshots for Googlebot to index my AJAX content. My Proposed Solution: If you haven't already, I suggest reading Google's Full Specification for Making AJAX Applications Crawlable. Imagine I have: a Sinatra app hosted on Heroku on the domain http://example.com the app has tabs along the top of the page TabA

How to make 2 HtmlUnit's WebClients use same cookies?

試著忘記壹切 提交于 2019-12-18 09:05:26
问题 If I create 2 WebClients in different threads, how do I make them use the same cookies? 回答1: You can use the below code: CookieManager cookieManager = new CookieManager(); webClient1.setCookieManager(cookieManager); webClient2.setCookieManager(cookieManager); 来源: https://stackoverflow.com/questions/3043745/how-to-make-2-htmlunits-webclients-use-same-cookies

crawl dynamic web page using htmlunit

丶灬走出姿态 提交于 2019-12-18 03:36:06
问题 I am crawling data using HtmlUnit from a dynamic webpage, which uses infinite scrolling to fetch data dynamically, just like facebook's newsfeed. I used the following sentence to simulate the scrolling down event: webclient.setJavaScriptEnabled(true); webclient.setAjaxController(new NicelyResynchronizingAjaxController()); ScriptResult sr=myHtmlPage.executeJavaScript("window.scrollBy(0,600)"); webclient.waitForBackgroundJavaScript(10000); myHtmlPage=(HtmlPage)sr.getNewPage(); But it seems

.net equivalent of htmlunit?

懵懂的女人 提交于 2019-12-17 22:34:11
问题 Does anybody know if there is a .net equivalent of htmlunit or similar library? I've heard that people have used IKVM to convert the htmlunit library. But I have also heard that the converted code is slow. Requirements: Headless browser Support javascript Handle cookies .Net 回答1: You can try out the just-released NHtmlUnit (available on NuGet), which is a .NET-wrapper for HtmlUnit. It's not .NET as in "written in a .NET language and compiled to MSIL", but it's converted to .NET with IKVM and

JavaScript not being properly executed in HtmlUnit

情到浓时终转凉″ 提交于 2019-12-17 21:10:06
问题 I'm currently developing some tests with HtmlUnit. It's loading a page that contains braintree.js (their form encryption library). I have a bunch running, but I'm stuck where it calls crypto. The JS in question is: (function() { try { var ab = new Uint32Array(32); crypto.getRandomValues(ab); sjcl.random.addEntropy(ab, 1024, "crypto.getRandomValues"); } catch (e) {} })(); HtmlUnit is throwing: EcmaError, ReferenceError, "'crypto' is not defined." I suppose HtmlUnit doesn't include crypto.

Call getPage from htmlunit WebClient with JavaScript disabled and setTimeout set to 10000 waits forever

萝らか妹 提交于 2019-12-17 20:26:12
问题 I'm having problems with Htmlunit, I disabled JavaScript and set timeout to 10000 before calling getpage, I expected an exception after timeout but htmlunit waits forever. After some search I realized someone in 2009 had the same problem (Connection timeout not working), he was complaining about "Connection timeout not working" and about some values in timeout not working but until now in 2011 didn't get any answer. Someone here was asking about what exception is thrown but I think it doesn't

HtmlUnit, how to post form without clicking submit button?

空扰寡人 提交于 2019-12-17 10:55:09
问题 I know that in HtmlUnit i can fireEvent submit on form and it will be posted. But what If I disabled javascript and would like to post a form using some built in function? I've checked the javadoc and haven't found any way to do this. It is strange that there is no such function in HtmlForm... I read the javadoc and tutorial on htmlunit page and I Know that i can use getInputByName() and click it. BuT sometimes there are forms that don't have submit type button or even there is such button