Use IP and host with Jsoup

元气小坏坏 提交于 2019-12-24 13:35:04

问题


I would like to access a webpage using ip and host, in order to save DNS lookup times by having stored values for domains. If via sockets, it'd be done by using sockets, transmitting a GET request of the following syntax:

Socket s = new Socket([string_ip_address], 80);

Then transmitting: Get [file_name] HTTP/1.1\r\n Host: [some_name]

But I would like to use Jsoup. The equivalent command to retrieve a page, saw www.google.com, in Jsoup is:

Jsoup.connect("http://www.google.com").get();

But the provided site name must be the actual name, not IP (because, if my limited understanding is correct, many domains can reside in the same ip address). So, I figured I might try and alter the request made by Jsoup, to include both site name and ip. Since Jsoup uses HttpUrlConnection in it's underlying code (here's a code scrap from the Jsoup library itself, as found here: https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/helper/HttpConnection.java):

HttpURLConnection conn = (HttpURLConnection) req.url().openConnection();

conn.setRequestMethod(req.method().name());
conn.setInstanceFollowRedirects(false);
conn.setConnectTimeout(req.timeout());
conn.setReadTimeout(req.timeout());

if (conn instanceof HttpsURLConnection) {
    if (!req.validateTLSCertificates()) {
         initUnSecureTSL();
         ((HttpsURLConnection)conn).setSSLSocketFactory(sslSocketFactory);
         ((HttpsURLConnection)conn).setHostnameVerifier(getInsecureVerifier());
    }
}

if (req.method().hasBody())
    conn.setDoOutput(true);
if (req.cookies().size() > 0)
    conn.addRequestProperty("Cookie", getRequestCookieString(req));
for (Map.Entry<String, String> header : req.headers().entrySet()) {
    conn.addRequestProperty(header.getKey(), header.getValue());
}

I thought about writing something like this:

Jsoup.connect(ip).header("Host", host);

But this doesn't seem to work. So, is there a known way to use ip + host in Jsoup requests (to spare DNS lookups), or is there some other way to skip the DNS lookup using Jsoup?

Thanks!

EDIT -

Just to be clear: Using sockets with IP and host name - works. For example, trying to fetch the main page of buzzfeed via IP in the following way:

Socket s = new Socket("23.34.229.118", 80);
BufferedReader reader = new BufferedReader(new InputStreamReader(s.getInputStream()));
PrintStream writer = new PrintStream(s.getOutputStream());
writer.println("GET / HTTP/1.0\r\nHost: www.buzzfeed.com\r\n");

String line;
while((line = reader.readLine()) != null)
{
    System.out.println(line);
}

s.close();

Works perfectly fine. But I am unable to access the page via

Jsoup.connect("http://23.34.229.118");

And I am quite sure that's because I need to specify the host somehow, if that's even possible. My attempt with

Jsoup.connect("http://23.34.229.118").header("Host", "buzzfeed.com"); 

failed and I got a 400 error.


回答1:


I believe I have found the solution.

The following line needs to be added to the code -

System.setProperty("sun.net.http.allowRestrictedHeaders", "true");

This is closely related to this question, since the implementation of Jsoup uses HttpURLConnection: Can I override the Host header where using java's HttpUrlConnection class?

Apparently, java simply blocks (by default) the ability to change some headers, one of which is the host header.



来源:https://stackoverflow.com/questions/34379129/use-ip-and-host-with-jsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!