Can I configure HTMLUnit to only run specific javascript processes and not the whole thing?

点点圈 提交于 2020-01-23 07:05:26

问题


I'm looking to gather information from a set of web pages that are all very similarly formatted. I need some information that is loaded onto the page by Javascript after opening. It seems that HTMLUnit is a pretty common tool to do this, so that's what I'm using. It's unfortunately very slow, which is a complaint I've seen across a lot of forums. The webClient.getPage() command is what is taking forever. When I turn off Javascript, it runs quickly, but I need to execute some Javascript commands. I was wondering, is there a way to selectively execute a few Javascript commands instead of all of them?

Alternatively, is there a program that is much faster than HTMLUnit for processing Javascript?


回答1:


Sort of. You can programatically decide which external JavaScript URLs to load:

HtmlUnit will run all JS embedded on the page, if JavaScript is enabled. However, if certain external URLs are not required, you can choose to not load them.

Here's some code to get your started:

    webClient.setWebConnection(new FalsifyingWebConnection(webClient) {
        @Override
        public WebResponse getResponse(WebRequest request) throws IOException {

            if(request.getUrl().getPath().toLowerCase().equals("some url i don't need ")) {
                return createWebResponse(request, "", "application/javascript");
            }

            return super.getResponse(request);
        }
    });

Setting the below might speed things up too:

    java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF); 

    webClient.setCssErrorHandler(new SilentCssErrorHandler());

    webClient.setIncorrectnessListener(new IncorrectnessListener() {
        @Override
        public void notify(String s, Object o) { }
    });

    webClient.getCookieManager().setCookiesEnabled(false);
    webClient.getOptions().setCssEnabled(false);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setPrintContentOnFailingStatusCode(false);


来源:https://stackoverflow.com/questions/23482544/can-i-configure-htmlunit-to-only-run-specific-javascript-processes-and-not-the-w

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!