How to print external script inside iframe using htmlunit?

对着背影说爱祢 提交于 2020-01-26 02:23:12

问题


import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController;
import com.gargoylesoftware.htmlunit.Page;
import com.gargoylesoftware.htmlunit.SilentCssErrorHandler;
import com.gargoylesoftware.htmlunit.ThreadedRefreshHandler;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.WebRequest;
import com.gargoylesoftware.htmlunit.html.HtmlPage;    
public class ReadHtml{
       public static void main(String[] args) throws Exception {
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);
    WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.getOptions().setActiveXNative(true);
    webClient.getOptions().setAppletEnabled(false);
    webClient.getOptions().setCssEnabled(true);
    webClient.getOptions().setDoNotTrackEnabled(true);
    webClient.getOptions().setGeolocationEnabled(false);
    webClient.getOptions().setPopupBlockerEnabled(false);
    webClient.getOptions().setPrintContentOnFailingStatusCode(true);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(true);
    webClient.getOptions().setThrowExceptionOnScriptError(true);
    webClient.setAjaxController(new NicelyResynchronizingAjaxController());
    webClient.setCssErrorHandler(new SilentCssErrorHandler());
    webClient.setRefreshHandler(new ThreadedRefreshHandler());
    webClient.getCookieManager().setCookiesEnabled(true);
    WebRequest request = new WebRequest(new URL("some url containing javascript to load html elements"));
    try {
            Page page;
            page = webClient.getPage(request);
            //System.out.println(page.getWebResponse().getContentAsString());
            System.out.println(((HtmlPage) page).asXml());
    } catch (FailingHttpStatusCodeException e) {
            e.printStackTrace();
    } catch (IOException e) {
            e.printStackTrace();
    }
}
}

I want to print all html element(not only source code), including html which are produced by javascript,iframes, nested iframes. I tried with this code but (also tried identifying by id,name but not prefer to print anyting specifically. want to print entire html contents), html load by javascript is not printing to console. Can Someone point out the modification need to be carried out? Thanks in advance.


回答1:


Try using page.asXML.

HTMLPage itself is a DOM Node, so you can iterate through the children recursively The frames may be accessed (recursively) via DOM or via page.getFrames.

If you need to print all the responses from server, you can use WebConnectionWrapper as interceptor. This will get you access to all the responses (including Script ones)


July 9

Frames are part of the DOM. But, if some of the content is being loaded asynchronously (Ajax), HTMLUnit might not have waited for that to load. Try adding an AjaxController to your WebClient. Here is an example.

For WebConnectoinWrapper, use this example. But again, if there is some asynchronous processing, HTMLUnit may exit before all the processing is done. So, AjaxController might be your best bet.

browser.setWebConnection(new WebConnectionWrapper(browser) {
  public WebResponse getResponse(final WebRequest request) throws IOException {
    WebResponse response = super.getResponse(request);
    //processResponse
    return response;
 }
});

July 10

NicelyResynchronizingAjaxController works for user initiated ajax. For "self loading" ones try something like this.

public class AlwaysSynchronizingAjaxController extends NicelyResynchronizingAjaxController {
public boolean processSynchron(HtmlPage page, WebRequest settings, boolean async) {
    return true;
};
}

If you are using Fiddler (or wireshark or any other sniffing/interceptor tools), see if you find the communication for the dynamically loaded requests.




回答2:


I found some solution for my task (Not exactly what i want )

List<WebWindow> windows = webClient.getWebWindows();
for(WebWindow w : windows){
        HtmlPage hpage2 = (HtmlPage) w.getEnclosedPage();
        System.out.println("-------------------------------------");
        System.out.println(hpage2.asXml());
}

By this way i could able to get all the iframe contents and nested iframe contents.Not as continuous page but as seperately.

when i know the iframe name i could extract that contents by

HtmlPage hpage = (HtmlPage)webClient.getWebWindowByName("google_esf").getEnclosedPage();

for now this resolves my problem.Still its better if someone suggest how to get as continuous page.



来源:https://stackoverflow.com/questions/24631872/how-to-print-external-script-inside-iframe-using-htmlunit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!