htmlunit | 易学教程

Downloading javascript image with HtmlUnit

阅读更多关于 Downloading javascript image with HtmlUnit

问题 How do I go about downloading the image generated at Leaflet easyPrint button using HtmlUnit? I am trying it like this: public static void main(String[] args) { try{ WebClient webClient = new WebClient(); HtmlPage test = webClient.getPage("http://rowanwins.github.io/leaflet-easyPrint/"); webClient.waitForBackgroundJavaScript(5000); final DomElement button = test.getFirstByXPath("/html/body/button"); final InputStream image = button.click().getWebResponse().getContentAsStream(); System.out

How To Fix: HtmlUnit GetElementById Returns Null

阅读更多关于 How To Fix: HtmlUnit GetElementById Returns Null

问题 I am writing a web scraper and am trying to type in a search word into a search box. However, it looks like I am getting null when I try to access the search box by ID. I am just learning HtmlUnit so I could be missing something very obvious but I have not been able to identify this myself yet. Here is the website's code: <html xmlns="http://www.w3.org/1999/xhtml" xml:1ang="en" class="no-touch"> <head>-</head> <body lang="en" class="garageBrand" emailcookiename="grgemailca" loyaltycookiename=

No static field INSTANCE of type Lorg/apache/http/conn/ssl/AllowAllHostnameVerifier when using htmlunit in Android Studio Project

阅读更多关于 No static field INSTANCE of type Lorg/apache/http/conn/ssl/AllowAllHostnameVerifier when using htmlunit in Android Studio Project

问题 I am using htmlunit 2.36.0 in my Android Studio Project. I successfully compiled the apk but I am getting some runtime errors when I try to get a webpage. Before, I was getting the following error: java.lang.BootstrapMethodError: Exception from call site But I was able to fix the issue by adding this in the gradle: compileOptions { sourceCompatibility JavaVersion.VERSION_1_8 targetCompatibility JavaVersion.VERSION_1_8 } However, now I am facing another error: java.lang.NoSuchFieldError: No

Get content of list of span elements with HTMLUnit and XPath

阅读更多关于 Get content of list of span elements with HTMLUnit and XPath

问题 I want to get a list of values from an HTML document. I am using HTMLUnit. There are many span elements with the class topic. I want to extract the content within the span tags: <span class="topic"> <a href="http://website.com/page/2342" class="id-24223 topic-link J_onClick topic-info-hover">Lean Startup</a> </span> My code looks like this: List<?> topics = (List)page.getByXPath("//span[@class='topic']/text()"); However whenever I try to iterate over the list I get a NoSuchElementException .

How to print external script inside iframe using htmlunit?

阅读更多关于 How to print external script inside iframe using htmlunit?

问题 import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException; import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController; import com.gargoylesoftware.htmlunit.Page; import com.gargoylesoftware.htmlunit.SilentCssErrorHandler; import com.gargoylesoftware.htmlunit.ThreadedRefreshHandler; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.WebRequest; import com.gargoylesoftware.htmlunit

HtmlUnit How to get a page after executing Javascript

阅读更多关于 HtmlUnit How to get a page after executing Javascript

问题 I'm trying to use Html Unit to run javascript on a webpage in order to change page. I'm importing : import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlPage; import com.gargoylesoftware.htmlunit.html.HtmlDivision; import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController; import com.gargoylesoftware.htmlunit.ScriptResult; and the code is something like : String javaScriptCode =

Retrieve openid bearer token using headless browser setup

阅读更多关于 Retrieve openid bearer token using headless browser setup

问题 Using OkHttp3 I was happily scraping a website for quite some time now. However, some components of the website have been upgraded and are now using an additional OpenID bearer authentication. I am 99.9% positive my requests are failing due to this bearer token because when I check with Chrome dev tools, I see the bearer token popping up only for these parts. Moreover, a couple of requests request are going to links that end with ".well-known/openid-configuration". In addition, when I

return all the HtmlPage's HTML

阅读更多关于 return all the HtmlPage's HTML

问题 I want the entire HTML for a given HtmlPage object. What property should I use? 回答1: In HtmlUnit, an HtmlPage implements the Page interface; that means that you can use Page#getWebResponse() to get the entire web response returned to generate the HtmlPage , and from there it's easy (WebResponse#getContentAsString()). Here's a method that does what you want... public String getRawPageText(WebClient client, String url) throws FailingHttpStatusCodeException, MalformedURLException, IOException {

HtmlUnit unable to get frame content

阅读更多关于 HtmlUnit unable to get frame content

问题 I am trying to set the value of a search box, to click a search button and to parse the results. The problem is that the results are displayed in another frame and I am not able to obtain the other frame. The code: import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlElement; import com.gargoylesoftware.htmlunit.html.HtmlPage;

Can I configure HTMLUnit to only run specific javascript processes and not the whole thing?

阅读更多关于 Can I configure HTMLUnit to only run specific javascript processes and not the whole thing?

问题 I'm looking to gather information from a set of web pages that are all very similarly formatted. I need some information that is loaded onto the page by Javascript after opening. It seems that HTMLUnit is a pretty common tool to do this, so that's what I'm using. It's unfortunately very slow, which is a complaint I've seen across a lot of forums. The webClient.getPage() command is what is taking forever. When I turn off Javascript, it runs quickly, but I need to execute some Javascript