Skip particular Javascript execution in HTML unit

痴心易碎 提交于 2019-12-12 08:04:11

问题


I have a URL. I want to fetch Page-Source of the URL after executing Java Scripts.

Fetch Page source using HtmlUnit : URL got stuck

Initially I suspected that it is due to system resource and High CPU usage, that the URL is getting stuck.

Then I tried to run it on HTML UNIT 2.9 and 2.11. It got stuck on both while parsing. Refer the above question for HTML UNIT code scrape that is getting stuck.

Now I am suspecting that this might be due to JS Execution going into infinite loop.

I want to check what JS files are causing problem and remove them from execution.

If they are JS for sites like google analytics, twitter etc, I may not need them at all.

So I want to find a way to tell HTML Unit to ignore certain JS file and execute the rest.

Does anybody know how to do that ?


回答1:


Try this. It worked for me:

class InterceptWebConnection extends FalsifyingWebConnection{
    public InterceptWebConnection(WebClient webClient) throws IllegalArgumentException{
        super(webClient);
    }
    @Override
    public WebResponse getResponse(WebRequest request) throws IOException {
        WebResponse response=super.getResponse(request);
        if(response.getWebRequest().getUrl().toString().endsWith("dom-drag.js")){
            return createWebResponse(response.getWebRequest(), "", "application/javascript", 200, "Ok");
        }
        return super.getResponse(request);
    }
}

then write following while setting up your webClient

new InterceptWebConnection(webClient);


来源:https://stackoverflow.com/questions/14439991/skip-particular-javascript-execution-in-html-unit

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!