htmlunit

Getting HtmlUnit to run under Android

南笙酒味 提交于 2019-12-04 12:11:16
I was wondering if anyone was able to make HtmlUnit run under Android? I have a site which I am scraping using Jsoup (this works well). However, one of the sections contains more than 2 pages. The site uses ASP.NET and they are using a Javascript postback for the link that leads to the next page. As a result I need to somehow execute that Javascript to get the next page's content. This is where my attempts at HtmlUnit comes in. The following code worked perfectly on Java: WebClient webClient = new WebClient(); webClient.setJavaScriptEnabled(true); HtmlPage page = null; webClient

is it possible to load a HtmlPage from a string?

家住魔仙堡 提交于 2019-12-04 10:00:41
I have stored a webpage's HTML in the database. I want to take advantage of HtmlUnit's ability to find/reference DOM elements. Is it possible to load the HtmlPage object from a string (via a database column)? StringWebResponse may help. Edit: example: URL url = new URL("http://www.example.com"); StringWebResponse response = new StringWebResponse("<html><head><title>Test</title></head><body></body></html>", url); HtmlPage page = HTMLParser.parseHtml(response, new TopLevelWindow("top", new WebClient())); System.out.println(page.getTitleText()); I assume you're using HtmlParser.parseHtml to

Getting error “Provider com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl not found” in unit test but not in main program

。_饼干妹妹 提交于 2019-12-04 03:47:15
I am building an application in C# which uses com.gargoylesoftware.htmlunit.WebClient to access and retrieve information from webpages. My application runs fine from the main project but when I try to build unit tests to test the project classes I get the following error: FactoryConfigurationError Message "Provider com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl not found" Source "IKVM.OpenJDK.XML.API" string StackTrace " at javax.xml.parsers.DocumentBuilderFactory.newInstance() at com.gargoylesoftware.htmlunit.javascript.configuration.JavaScriptConfiguration

HtmlUnit accessing an element without id or Name

谁都会走 提交于 2019-12-03 17:21:34
How can I access this element: <input type="submit" value="Save as XML" onclick="some code goes here"> More info: I have to access programmatically a web page and simulate clicking on a button on it, which then will generate a xml file which I hope to be able to save on the local machine. I am trying to do so by using HtmlUnit libraries, but all examples I could find use getElementById() or getElementByName() methods. Unfortunately, this exact element doesn't have a name or Id, so I failed miserably. I supposed then that the thing I have to do is use the getByXPath() method but I got

Java – How can I Log into a Website with HtmlUnit?

孤者浪人 提交于 2019-12-03 17:02:07
I am writing a Java program to log into the website my school uses to post grades. This is the url of the login form: https://ma-andover.myfollett.com/aspen/logon.do This is the HTML of the login form: <form name="logonForm" method="post" action="/aspen/logon.do" autocomplete="off"><div><input type="hidden" name="org.apache.struts.taglib.html.TOKEN" value="30883f4c7e25a014d0446b5251aebd9a"></div> <input type="hidden" id="userEvent" name="userEvent" value="930"> <input type="hidden" id="userParam" name="userParam" value=""> <input type="hidden" id="operationId" name="operationId" value="">

Restricting Selenium/Webdriver/HtmlUnit to a certain domain

一曲冷凌霜 提交于 2019-12-03 14:25:22
While using selenium/webdriver for web scraping, I realized the target site has google analytics script running. Is there a way to restrict selenium/webdriver/htmlunit to avoid certain urls/domains ? Thanks, I think it is impossible becouse Selenium is actually adapter for several implementation. So he can't deny to load some scripts to firefox or chrome. Perhaps you can check driver api(firefox profile, htmlunit configuration file) to accomplish this. 来源: https://stackoverflow.com/questions/6468624/restricting-selenium-webdriver-htmlunit-to-a-certain-domain

Are Futures executed on a single thread? (Scala)

≡放荡痞女 提交于 2019-12-03 12:42:21
问题 Using the default implicit execution context in Scala, will each new future be computed on a single, dedicated thread or will the computation be divided up and distributed to multiple threads in the thread pool? I don't know if this helps, the background to this question is that I want to perform multiple concurrent operations using the HtmlUnit API. To do this, I would wrap each new WebClient instance in a Future. The only problem is that the WebClient class is not thread safe, so I'm

“Run” HTMLUnit with PHP

匿名 (未验证) 提交于 2019-12-03 10:03:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: So I have installed Java on my CentOS server. I now want to be able to use PHP to run HTMLUnit to get a fully rendered webpage and then return the results to the user. I see the "simple" example on HTMLUnit but I know next to nothing about Java and don't know where that needs to go or be ran to even get the test case working (i.e. getting Google's homepage). public void getURL() throws Exception { final WebClient webClient = new WebClient(); final HtmlPage page = webClient.getPage("http://google.com"); // Pass in URL // RETURN "page" } Once

HtmlUnit download attachments [closed]

大城市里の小女人 提交于 2019-12-03 09:36:59
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 5 years ago . I need to save files from websites Using HtmlUnit . I am currently navigating to pages that have several anchors that use javascript onClick()="DownloadAttachment('attachmentId')" to get the files. The files can be of pretty much any type ( xls, doc, txt, pdf, jpg, etc). So far