htmlunit

HtmlUnit, how to post form without clicking submit button?

匿名 (未验证) 提交于 2019-12-03 01:36:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I know that in HtmlUnit i can fireEvent submit on form and it will be posted. But what If I disabled javascript and would like to post a form using some built in function? I've checked the javadoc and haven't found any way to do this. It is strange that there is no such function in HtmlForm... I read the javadoc and tutorial on htmlunit page and I Know that i can use getInputByName() and click it. BuT sometimes there are forms that don't have submit type button or even there is such button but without name attribute. I am asking for help in

HtmlUnit on Android

匿名 (未验证) 提交于 2019-12-03 01:28:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: I have been coding a web scraping application on Android, using HtmlUnit. But, when I build the app, the build error occur "Conversion to Dalvik format failed with error 1". So, how do I build the android app using HtmlUnit? please... 回答1: I just ran into this, and saw a lot of other errors triggered by XML-related jars, including this: [ 2011 - 05 - 20 12 : 57 : 50 - Android Hello ] Dx trouble processing "javax/xml/XMLConstants.class" : Ill - advised or mistaken usage of a core class ( java .* or javax .*) when not building a core

HtmlAnchor click() function in Htmlunit is not working

匿名 (未验证) 提交于 2019-12-03 01:00:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am trying to use HtmlUnit for browsing automatically a site. I need to press some buttons in the process. First I build an HtmlAnchor object of a button with this xml: <a href="dog.php"> <img src="http://images.hand.co.uk/Pic/site_images/hand/Myper/MyOrder/images/DogRed.gif" width="75" height="31" border="0" alt="1 adds"/> </a> which works fine when I click it using the click() method. I am then moved to another page in which I have link on which I need to click for the desired contents to appear. After the click I am not moved to another

HtmlUnit download attachments [closed]

拈花ヽ惹草 提交于 2019-12-03 00:07:12
I need to save files from websites Using HtmlUnit . I am currently navigating to pages that have several anchors that use javascript onClick()="DownloadAttachment('attachmentId')" to get the files. The files can be of pretty much any type ( xls, doc, txt, pdf, jpg, etc). So far though I've been unable to find resources or examples that show how to save files using htmlUnit . I've been trying mainly to get AttachmentHandler to work for this as it seems the most likely to work, but have been unsuccessful. I was wondering if anyone else has managed to download files using HtmlUnit and could

Java网页抓取技术HtmlUnit

匿名 (未验证) 提交于 2019-12-02 21:53:52
HttpClient HttpClient 是 Apache Jakarta Common 下的子项目,可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包,并且它支持 HTTP 协议最新的版本和建议。 以下列出的是 HttpClient 提供的主要的功能,要知道更多详细的功能可以参见 HttpClient 的主页。 (1)实现了所有 HTTP 的方法(GET,POST,PUT,HEAD 等) (2)支持自动转向 (3)支持 HTTPS 协议 (4)支持代理服务器 (5)支持自动的Cookies管理等 Jsoup jsoup 是一款Java 的HTML解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于jQuery的操作方法来取出和操作数据。 网页获取和解析速度飞快,推荐使用。 主要功能如下: 从一个URL,文件或字符串中解析HTML; 使用DOM或CSS选择器来查找、取出数据; 可操作HTML元素、属性、文本; HtmlUnit htmlunit 是一款开源的java 页面分析工具,读取页面后,可以有效的使用htmlunit分析页面上的内容。项目可以模拟浏览器运行,被誉为java浏览器的开源实现。这个没有界面的浏览器,运行速度也是非常迅速的。采用的是Rhinojs引擎。模拟js运行。

Java网页抓取技术HtmlUnit

匿名 (未验证) 提交于 2019-12-02 20:37:20
??在Java中有很多开源的组件来支持各种各样方式的网页抓取,仅仅是使用Java做网页抓取还是比较容易的。主要的网页抓取技术: HttpClient HttpClient 是 Apache Jakarta Common 下的子项目,可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包,并且它支持 HTTP 协议最新的版本和建议。 以下列出的是 HttpClient 提供的主要的功能,要知道更多详细的功能可以参见 HttpClient 的主页。 (1)实现了所有 HTTP 的方法(GET,POST,PUT,HEAD 等) (2)支持自动转向 (3)支持 HTTPS 协议 (4)支持代理服务器 (5)支持自动的Cookies管理等 Jsoup jsoup 是一款Java 的HTML解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于jQuery的操作方法来取出和操作数据。 网页获取和解析速度飞快,推荐使用。 主要功能如下: 从一个URL,文件或字符串中解析HTML; 使用DOM或CSS选择器来查找、取出数据; 可操作HTML元素、属性、文本; HtmlUnit htmlunit 是一款开源的java 页面分析工具,读取页面后,可以有效的使用htmlunit分析页面上的内容。项目可以模拟浏览器运行

Alternative to HtmlUnit

此生再无相见时 提交于 2019-12-02 18:14:49
I have been researching about the headless browsers available till to date and found HtmlUnit being used pretty extensively. Do we have any alternative to HtmlUnit with possible advantage compared to HtmlUnit? Thanks Nayn As far as I know, HtmlUnit` is the most powerful headless browser. What are you issues with it? Sajid Hussain There are many other libraries that you can use for this. If you need to scrape xml base data use JTidy . If you need to scrape specific data from HTML you can use Jsoup . Well I use jsoup - it's pretty much faster than any other API. WebDriver with a virtual

HtmlUnit and JavaScript in links

余生长醉 提交于 2019-12-02 14:03:42
问题 Copied from here: I need to save files from websites using HtmlUnit. I am currently navigating to pages that have several anchors that use javascript: onClick()="DownloadAttachment('attachmentId')" So far though I've been unable to find resources or examples that show how to save files using HtmlUnit. I've been trying mainly to get AttachmentHandler to work for this as it seems the most likely to work, but have been unsuccessful. How do I use AttachmentHandler to get at the data stream which

How to convert an <img… in html to byte [] in Java

随声附和 提交于 2019-12-02 13:51:50
问题 I have opened a webpage in HtmlUnit headless browser. Now that webpage contains a image html tag as follows: <img src="..." /> So I want that image only. But the problem is that the same src URL of the image shows diff. image each time. Means, if we refresh the img src URL, then it shows diff. image each time. So how to get the image that is displayed on the html page. 回答1: When you get the HTMLPage, you have to get the image through one of its method. You can then get an HtmlImage, which can

HtmlUnitDriver does not load javascript when navigating a page from an url

折月煮酒 提交于 2019-12-02 08:54:26
问题 Here is my problem, I'm trying to load my website page for testing it but when i look at the html I get from the HtmlUnitDriver, elements displayed with javascript are not present. I am using selenium-java 3.141.59 and htmlunit-driver 2.33.3 Here is my code HtmlUnitDriver driver = new HtmlUnitDriver(); driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS); driver.setJavascriptEnabled(true); driver.get("https://stackoverflow.com/questions/7926246/why-doesnt-htmlunitdriver-execute