java.net.URI chokes on special characters in host part

五迷三道 提交于 2019-12-01 19:11:53
axtavt

Java 6 has IDN class to work with internationalized domain names. So, the following produces URI with encoded hostname:

URI u = new URI("http://" + IDN.toASCII("www.christlicheparteiösterreichs.at") + "/steiermark/");

The correct way to encode non-ASCII characters in hostnames is known as "Punycode".

URI throws an URISyntaxException, when you choose the appropriate constructor:

URI someUri=new URI("http","www.christlicheparteiösterreichs.at","/steiermark",null);

java.net.URISyntaxException: Illegal character in hostname at index 28: http://www.christlicheparteiösterreichs.at/steiermark

You can use IDN for this to fix:

URI someUri=new URI("http",IDN.toASCII("www.christlicheparteiösterreichs.at"),"/steiermark",null);
System.out.println(someUri);
System.out.println("host: "+someUri.getHost()));

Output:

http://www.xn--christlicheparteisterreichs-5yc.at/steiermark

host: www.xn--christlicheparteisterreichs-5yc.at

UPDATE regarding the chicken-egg-problem:

You can let URL do the job:

public static URI createSafeURI(final URL someURL) throws URISyntaxException
{
return new URI(someURL.getProtocol(),someURL.getUserInfo(),IDN.toASCII(someURL.getHost()),someURL.getPort(),someURL.getPath(),someURL.getQuery(),someURL.getRef());    
}


URI raoul=createSafeURI(new URL("http://www.christlicheparteiösterreichs.at/steiermark/readme.html#important"));

This is just a quick-shot, it is not checked all issues concerning converting an URL to an URI. Use it as a starting point.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!