Error while using HtmlUnit

好久不见. 提交于 2019-12-11 02:42:51

问题


When I execute this simple code to get the contents of a website as text, it shows errors which I can't understand.

import java.io.IOException;
import java.net.MalformedURLException;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.ScriptException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class sd {
    public static void main(String[] args) {
        sd vip=new sd();
        try {
            vip.homePage();
        } catch (Exception e) {
            e.printStackTrace();
        }

        System.out.print("sssss");
    }

    public void homePage() throws Exception, ScriptException {
        final WebClient webClient = new WebClient();
        final HtmlPage page =       
    (HtmlPage)webClient.getPage("http://timesofindia.indiatimes.com/");
        String pageAsText = page.asText();
        String pageAsXML = page.asXml();

        // System.out.println(pageAsXML);
        System.out.println("////////////////////output//////////////////////////"); 
        System.out.println(pageAsText);
        // System.out.println(pageAsXML);
        System.out.println("////////////////////output ends//////////////////////////"); 
    }

}

Error that I get:

   ======= EXCEPTION START ========
Exception class=[com.gargoylesoftware.htmlunit.ScriptException]
com.gargoylesoftware.htmlunit.ScriptException: Exception invoking jsxFunction_write
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:595)
Caused by: java.lang.RuntimeException: Exception invoking jsxFunction_write
Caused by: com.gargoylesoftware.htmlunit.ScriptException: Exception invoking jsxFunction_write
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:595)

回答1:


set your webClient to not throw javascript exceptions

webClient.setThrowExceptionOnScriptError(false);

If not enougth, set FF as client behavior when initializing your webclient.

webClient = new WebClient(BrowserVersion.FIREFOX_3_6); webClient = new WebClient(BrowserVersion.FIREFOX_10); // depending on HtmlUnit version




回答2:


The WebClient::setThrowExceptionOnScriptError method is deprecated since the HtmlUnit version 2.11. Use the following within newer versions:

webClient.getOptions().setThrowExceptionOnScriptError(false);



回答3:


Even I had this error. This option of setting WebClient to suppress errors works for basic websites. But as the website becomes complex, it literally fails

After multiple trials, I finally had to choose Phantomjs. It is written in C++. I had to write some scripts and then execute it using phantomjs. The script would load the url and write the data to a file.

Once that file is ready, I would write a java program to load the file data and then do my operations on that file. For loading and scraping through the data, I had used Jsoup.

As you can see, HtmlUnit, Jaunt, Jsoup support full HTML, CSS. What they are missing is that they do not support Javascript completely. That is the main reason of errors such as Exceptions thrown, complete page not getting loaded and so on..



来源:https://stackoverflow.com/questions/11249317/error-while-using-htmlunit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!