How to get the complete URL address most efficiently?

穿精又带淫゛_ 提交于 2019-12-07 20:23:04

问题


I'm using a Java program to get expanded URLs from short URLs. Given a Java URLConnection, among the two approaches, which one is better to get the desired result?

Connection.getHeaderField("Location");

vs

Connection.getURL();

I guess both of them give the same output. The first approach did not give me the best results, only 1 out of 7 were resolved. Can the efficiency be increased by the second approach?

Can we use any other better approach?


回答1:


I'd use the following:

@Test
public void testLocation() throws Exception {
    final String link = "http://bit.ly/4Agih5";

    final URL url = new URL(link);
    final HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
    urlConnection.setInstanceFollowRedirects(false);

    final String location = urlConnection.getHeaderField("location");
    assertEquals("http://stackoverflow.com/", location);
    assertEquals(link, urlConnection.getURL().toString());
}

With setInstanceFollowRedirects(false) the HttpURLConnection does not follow redirects and the destination page (stackoverflow.com in the above example) will not be downloaded just the redirect page from bit.ly.

One drawback is that when a resolved bit.ly URL points to another short URL for example on tinyurl.com you will get a tinyurl.com link, not what the tinyurl.com redirects to.

Edit:

To see the reponse of bit.ly use curl:

$ curl --dump-header /tmp/headers http://bit.ly/4Agih5
<html>
<head>
<title>bit.ly</title>
</head>
<body>
<a href="http://stackoverflow.com/">moved here</a>
</body>
</html>

As you can see bit.ly sends only a short redirect page. Then check the HTTP headers:

$ cat /tmp/headers
HTTP/1.0 301 Moved Permanently
Server: nginx
Date: Wed, 06 Nov 2013 08:48:59 GMT
Content-Type: text/html; charset=utf-8
Cache-Control: private; max-age=90
Location: http://stackoverflow.com/
Mime-Version: 1.0
Content-Length: 117
X-Cache: MISS from cam
X-Cache-Lookup: MISS from cam:3128
Via: 1.1 cam:3128 (squid/2.7.STABLE7)
Connection: close

It sends a 301 Moved Permanently response with a Location header (which points to http://stackoverflow.com/). Modern browsers don't show you the HTML page above. Instead they automatically redirect you to the URL in the Location header.




回答2:


The above link contains a more complete method along the same line as the previous post https://github.com/cpdomina/WebUtils/blob/master/src/net/cpdomina/webutils/URLUnshortener.java



来源:https://stackoverflow.com/questions/7793827/how-to-get-the-complete-url-address-most-efficiently

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!