Encode and Decode rfc2396 URLs

♀尐吖头ヾ 提交于 2019-11-28 23:45:04

Use the URI class as follows:

URI uri = new URI("http", "//www.someurl.com/has spaces in url", null);
URL url = uri.toURL();

or if you want a String:

String urlString = uri.toASCIIString();

Your component parts, potentially containing characters that must be escaped, should already have been escaped using URLEncoder before being concatenated into a URI.

If you have a URI with out-of-band characters in (like space, "<>[]{}\|^`, and non-ASCII bytes), it's not really a URI. You can try to fix them up by manually %-escaping them, but this is a last-ditch fix-up operation and not a standard form of encoding. This is usually necessary when you are accepting potentially-malformed URIs from user input, but it's not a standardised operation and I don't know of any built-in Java library function that will do it for you; you may have to hack something up yourself with a RegExp.

In the other direction, you must take your URI apart into its component parts (each separate path part, query parameter name and value, and so on) before you can unescape each part (using an URLDecoder). There is no sensible way to %-decode a whole URI in one go; you could try to ‘decode %-escapes that do not decode to delimiters’ (like /?=&;%) but you would be left with a strange inconsistent string that doesn't conform to any URI-processing standard.

URLEncoder/URLDecoder are fine for handling URI query components, both names and values. However they are not quite right for handling URI path part components. The difference is that the ‘+’ character does not mean a space in a path part. You can fix this up with a simple string replace: after URLEncoding, replace ‘+’ with ‘%20’; before URLDecoding, replace ‘+’ with ‘%2B’. You can ignore the difference if you are not planning to include segments containing spaces or pluses in your path.

The javadocs recommend using the java.net.URI class to accomplish the encoding. To ensure that the URI class properly encodes the url, one of the multi-argument constructors must be used. These constructors will perform the required encoding, but require you to parse any url string into the parameters.

If you want to decode, you must construct the URI with the single argument constructor, which does not do any encoding. You can then call methods such as getPath() etc. to retrieve and build the decoded URL.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!