Make HttpURLConnection load web pages with images

送分小仙女□ 提交于 2019-12-02 03:10:57

You need to manipulate the HTML that way so that all resource URLs on the intranet domain are proxied as well. E.g. all of the following resource references in HTML

<base href="http://intranet.com/" />
<script src="http://intranet.com/script.js"></script>
<link href="http://intranet.com/style.css" />
<img src="http://intranet.com/image.png" />
<a href="http://intranet.com/page.html">link</a>

should be changed in the HTML that way so that they go through your proxy servlet instead, e.g.

<base href="http://example.com/proxy/" />
<script src="http://example.com/proxy/script.js"></script>
<link href="http://example.com/proxy/style.css" />
<img src="http://example.com/proxy/image.png" />
<a href="http://example.com/proxy/page.html">link</a>

A HTML parser like Jsoup is extremely helpful in this. You can do as follows in your proxy servlet which is, I assume, mapped on an URL pattern of /proxy/*.

String intranetURL = "http://intranet.com";
String internetURL = "http://example.com/proxy";

if (request.getRequestURI().endsWith(".html")) { // A HTML page is requested.
    Document document = Jsoup.connect(intranetURL + request.getPathInfo()).get();

    for (Element element : document.select("[href]")) {
        element.attr("href", element.absUrl("href").replaceFirst(intranetURL, internetURL));
    }

    for (Element element : document.select("[src]")) {
        element.attr("src", element.absUrl("src").replaceFirst(intranetURL, internetURL));
    }

    response.setContentType("text/html;charset=UTF-8");
    response.setCharacterEncoding("UTF-8");
    resposne.getWriter().write(document.html());
}
else { // Other resources like images, etc.
    URLConnection connection = new URL(intranetURL + request.getPathInfo()).openConnection();

    for (Map.Entry<String, List<String>> header : connection.getHeaderFields().entrySet()) {
        for (String value : header.getValue()) {
            response.addHeader(header.getKey(), value);
        }
    }

    InputStream input = connection.getInputStream();
    OutputStream output = response.getOutputStream();
    // Now just copy input to output.
}

You have to make a separate request for each image. That's what browsers do as well.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!