htmlunit

JavaScript not being properly executed in HtmlUnit

落爺英雄遲暮 提交于 2019-11-28 14:01:20
I'm currently developing some tests with HtmlUnit. It's loading a page that contains braintree.js (their form encryption library). I have a bunch running, but I'm stuck where it calls crypto. The JS in question is: (function() { try { var ab = new Uint32Array(32); crypto.getRandomValues(ab); sjcl.random.addEntropy(ab, 1024, "crypto.getRandomValues"); } catch (e) {} })(); HtmlUnit is throwing: EcmaError, ReferenceError, "'crypto' is not defined." I suppose HtmlUnit doesn't include crypto. Would it be possible to include a crypto library myself? Based on your comment, I have to tell you that

How can I tell HtmlUnit's WebClient to download images and css?

烂漫一生 提交于 2019-11-28 13:22:52
How can I make WebClient download external css stylesheets and image bodies just like a usual web browser does? What I'm doing right now is: public static final HashMap<String, String> acceptTypes = new HashMap<String, String>(){{ put("html", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"); put("img", "image/png,image/*;q=0.8,*/*;q=0.5"); put("script", "*/*"); put("style", "text/css,*/*;q=0.1"); }}; protected void downloadCssAndImages(HtmlPage page) { String xPathExpression = "//*[name() = 'img' or name() = 'link' and @type = 'text/css']"; List<?> resultList = page

Call getPage from htmlunit WebClient with JavaScript disabled and setTimeout set to 10000 waits forever

天涯浪子 提交于 2019-11-28 12:42:59
I'm having problems with Htmlunit, I disabled JavaScript and set timeout to 10000 before calling getpage, I expected an exception after timeout but htmlunit waits forever. After some search I realized someone in 2009 had the same problem ( Connection timeout not working ), he was complaining about "Connection timeout not working" and about some values in timeout not working but until now in 2011 didn't get any answer. Someone here was asking about what exception is thrown but I think it doesn't throw it always. I can't get an answer from Apache HttpClient setTimeout , either. You can see

How to get the pure raw HTML of a page in HTMLUnit while ignoring JavaScript and CSS?

 ̄綄美尐妖づ 提交于 2019-11-28 12:24:34
I just want the text content of page and I want the fetching to be as lightweight as possible. Can I turn off all the parsing and additional loading of JavaScript, CSS and other external content that HTMLUnit does out of the box? Mosty Mostacho I think the closest thing to what you're looking for is: WebClient webClient = new WebClient(); webClient.setCssEnabled(false); webClient.setAppletEnabled(false); webClient.setJavaScriptEnabled(false); For HtmlUnit 2.13 and above, use webclient.getOptions() . Also this question and answer might be useful too. It really made things faster for me, but I

HtmlUnit ignore JavaScript errors?

杀马特。学长 韩版系。学妹 提交于 2019-11-28 11:57:46
I'm trying to traverse through a website but on one of their pages I get this error: EcmaError: lineNumber=[671] column=[0] lineSource=[null] name=[TypeError] sourceName=[https://reservations.besodelsolresort.com/asp/CalendarPopup.js] message=[TypeError: Cannot read property "parentNode" from undefined (https://reservations.besodelsolresort.com/asp/CalendarPopup.js#671)] com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot read property "parentNode" from undefined (https://reservations.besodelsolresort.com/asp/CalendarPopup.js#671) Is there anyway I can just ignore this error? I

HtmlUnit getByXpath returns null

杀马特。学长 韩版系。学妹 提交于 2019-11-28 11:48:44
I am coding with Groovy, however, I don't believe its a language specific set of questions. I actually have two questions First Question I've run into an issue while using HtmlUnit. It is telling me that what I am trying to grab is null. The page I'm testing it on is: http://browse.deviantart.com/resources/applications/psbrushes/?order=9&offset=0#/dbwam4 My code: client = new WebClient(BrowserVersion.FIREFOX_3) client.javaScriptEnabled = false page = client.getPage(url) //coming up as null title = page.getByXPath("//html/body/div[4]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a")

java使用htmlunit模拟js运行

情到浓时终转凉″ 提交于 2019-11-28 11:07:56
htmlunit 是一款开源的java 页面分析工具,读取页面后,可以有效的使用htmlunit分析页面上的内容。项目可以模拟浏览器运行,被誉为java浏览器的开源实现。这个没有界面的浏览器,运行速度也是非常迅速的。采用的是Rhinojs引擎。模拟js运行。 说白了就是一个浏览器,这个浏览器是用Java写的无界面的浏览器,正因为其没有界面,因此执行的速度还是可以滴,HtmlUnit提供了一系列的API,这些API可以干的功能比较多,如表单的填充,表单的提交,模仿点击链接,由于内置了Rhinojs引擎,因此可以执行Javascript。 网页获取和解析速度较快,性能较好,推荐用于需要解析网页脚本的应用场景。 在使用此工具前需要导入htmlunit需要的jar包: image 代码: public static String url="https://www.youxs.org";//抓取数据的地址 public static void main(String[] args) throws IOException, SAXException { WebClient wc = new WebClient(BrowserVersion.FIREFOX_52); wc.getOptions().setJavaScriptEnabled(true); //启用JS解释器,默认为true wc

Htmlunit on Android application

落花浮王杯 提交于 2019-11-28 07:00:48
问题 Has anybody gotten HTMLUnit (or HtmlUnitDriver) to work on Android apps? This is the problem : I am getting the following error message: 11-26 16:27:26.617: E/AndroidRuntime(1265): java.lang.NoClassDefFoundError: org/w3c/dom/css/CSSRule This is what I did: I tried adding adding references to the jars listed in the following link (under both Project Dependencies and Project Transitive Dependencies - compile only, excluding test jars): http://htmlunit.sourceforge.net/dependencies.html However

How do I perform Web Scraping in Android? [closed]

馋奶兔 提交于 2019-11-28 06:27:11
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 3 years ago . I want to scrape my website and then use the data from the website to populate elements in my app, my website has login pages and certain pages only open after the login has been done. I started working with HtmlUnit as it is a headless browser and completed the custom api in a

How do I click a javascript button with htmlunit?

浪子不回头ぞ 提交于 2019-11-28 05:46:23
问题 I'm working on an application that will automatically click a button on a webpage using htmlunit in Java. Only problem is that that button is a javascript button, so the standard getInputByName() won't work. Any suggestions with dealing with this? The code for the button is included below. <a class="vote_1" id="1537385" href="/javascript%3Avoid%280%29/index"><img src="/images/parts/btn-vote.gif" alt="Btn-vote" /></a> In addition, here's the other code for voting. <div id="content"><script