HTMLUnit : super slow execution?

北城余情 提交于 2019-11-29 08:04:53

问题


I have been using HTMLUnit . It suits my requirements well. But it seems to be extremely slow. for example : I have automated the following scenario using HTMLUnit

Goto Google page
Enter some text
Click on the search button
Get the title of the results page
Click on the first result.

Code :

long t1=System.currentTimeMillis();
Logger logger=Logger.getLogger("");
logger.setLevel(Level.OFF);
WebClient webClient=createWebClient();
WebRequest webReq=new WebRequest(new URL("http://google.lk"));

HtmlPage googleMainPage=webClient.getPage(webReq);
HtmlTextInput searchTextField=(HtmlTextInput) googleMainPage.getByXPath("//input[@name='q']").get(0);
HtmlButton searchButton=(HtmlButton) googleMainPage.getByXPath("//button[@name='btnK']").get(0);

searchTextField.type("Sri Lanka");
System.out.println("Text typed!");
HtmlPage googleResultsPage= searchButton.click();
System.out.println("Search button clicked!");

System.out.println("Title : " + googleResultsPage.getTitleText());

HtmlAnchor firstResultLink=(HtmlAnchor) googleResultsPage.getByXPath("//a[@class='l']").get(0);
HtmlPage firstResultPage=firstResultLink.click();
System.out.println("First result clicked!");

System.out.println("Title : " + firstResultPage.getTitleText());
//System.out.println(firstResultPage.asText());
long t2=System.currentTimeMillis();
long diff=t2-t1;
System.out.println("Time elapsed : "  + milliSecondsToHrsMinutesAndSeconds(diff));

webClient.closeAllWindows();

It works 100% well. But it takes 3 minutes,41 seconds

I guess the reason for the slow execution is validating each and every element on the page.

My question is how to reduce the execution time of HTMLUnit ? is there any way to disable validations on webpages.

Thanks in advance!


回答1:


For the current htmlUnit 2.13, setting options is slightly different from what maxmax has provided:

final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setCssEnabled(false);//if you don't need css
webClient.getOptions().setJavaScriptEnabled(false);//if you don't need js
HtmlPage page = webClient.getPage("http://XXX.xxx.xx");
...

In my own test, this is 8 times faster than the default options.(Note that this could be webpage-dependent)




回答2:


  • Be sure to use latest htmlunit version (2.9). I had a performance boost from previous version.

I get your example done within 20s, or 40s depending options i set. As i can't see the webClient initialisation, i guess maybe it could be the problem.

Here's my initialisation for a 20s treatment :

WebClient client = new WebClient(BrowserVersion.FIREFOX_3_6);
    client.setTimeout(60000);
    client.setRedirectEnabled(true);
    client.setJavaScriptEnabled(true);
    client.setThrowExceptionOnFailingStatusCode(false);
    client.setThrowExceptionOnScriptError(false);
    client.setCssEnabled(false);
    client.setUseInsecureSSL(true);



回答3:


I recommend also to set a time limit to the javascript:

   client.setJavaScriptTimeout(30000); //e.g. 30s


来源:https://stackoverflow.com/questions/10442803/htmlunit-super-slow-execution

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!