How do you unescape URLs in Java?

人盡茶涼 提交于 2019-11-27 06:33:51

问题


When I read the xml through a URL's InputStream, and then cut out everything except the url, I get "http://cliveg.bu.edu/people/sganguly/player/%20Rang%20De%20Basanti%20-%20Tu%20Bin%20Bataye.mp3".

As you can see, there are a lot of "%20"s.

I want the url to be unescaped.

Is there any way to do this in Java, without using a third-party library?


回答1:


This is not unescaped XML, this is URL encoded text. Looks to me like you want to use the following on the URL strings.

URLDecoder.decode(url);

This will give you the correct text. The result of decoding the like you provided is this.

http://cliveg.bu.edu/people/sganguly/player/ Rang De Basanti - Tu Bin Bataye.mp3

The %20 is an escaped space character. To get the above I used the URLDecoder object.




回答2:


URLDecoder.decode(String s) has been deprecated since Java 5

You should use URLDecoder.decode(String s, String enc).

For example:

URLDecoder.decode(url, "UTF-8")

Regarding the encoding to use:

Note: The World Wide Web Consortium Recommendation states that UTF-8 should be used. Not doing so may introduce incompatibilites.




回答3:


I'm having problems using this method when I have special characters like á, é, í, etc. My (probably wild) guess is widechars are not being encoded properly... well, at least I was expecting to see sequences like %uC2BF instead of %C2%BF.

Edited: My bad, this post explains the difference between URL encoding and JavaScript's escape sequences: URI encoding in UNICODE for apache httpclient 4



来源:https://stackoverflow.com/questions/623861/how-do-you-unescape-urls-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!